Hi all, I cant figure out why my robots file has all these disallows on it, can anyone help? I dont have funds for a shopify expert.
My site orangegrove.conz is not showing up on Google so I had a look around after ContentKing brought it to my attention.
I am using Minimal theme, there seem to be quite a few things with the free themes. eg Scss files and the message that support is deprecated in themes and to convert the files. But thats another story.
this is the default robots.txt configuration for Shopify stores. It works well and I shouldnât worry about it at the moment, unless you have a very specific need.
Regarding the deprecation of SCSS I also shouldnât worry for now. Shopify now âsuggestâ to avoid using SCSS for new themes, but SCSS is still supported and it will be for long time.
I was also worried about the same disallow thing in the Sitemap but after reading this thread I am bit relaxed.
However, itâs been 3 weeks since I launch my website and the pages are not yet indexed in google.
When I go to google console I can see 78 Undiscovered URLs. I then requested re-indexing of these URLs using URL inspection tool within Search console but no effect after 5 days.
Can you anyone help me with how can I get all my URLs indexed.
I reached out to Shopify about this and they gave me a line-by-line explanation. Looks like the disallow is there for valid reasons and in my case, it looks like some schema information is missing on my website. I figured this out by running my missing pages through https://search.google.com/test/rich-results . I have had to hire someone from SEO-JSON-LD,Schema by Webrex Studio to fix this for me as itâs a lengthy process. I will update here if they are able to fix my problems successfully for me!
Shopifyâs reply:
Here is a line by line explanation for you: we use Shopify as our ecommerce platform - This is just a comment on the file, it doesnât do anything.
User-agent: * - These rules are set for any visitor, and all bots should follow them (though a bot may not; its not mandatory).
Disallow: /admin - This keeps admin pages from being crawled, because no bot would be able to view that page, and admin pages do not need to be indexed or ranked by search engine crawler bots.
Disallow: /cart - Cart pages without items in them donât need to be crawled, indexed, or ranked.
Disallow: /orders - Doesnât need to be crawled, indexed, or ranked.
Disallow: /checkout - Doesnât need to be crawled, indexed, or ranked.
Disallow: /SHOPID/checkouts - An area for recovering checkouts. Doesnât need to be crawled, indexed, or ranked.
Disallow: /SHOPID/orders - Order status pages. Donât need to be crawled/indexed/ranked.
Disallow: /carts - Same as above.
Disallow: /account - Customer accounts. These donât need to be crawled, and bots would usually not have customer accounts anyway.
Disallow: /collections/+ - Any collection with tag filtering will use + signs to link tags, and since thatâs not an actual collection in itself we donât want that crawled like an actual collection. The asterisks () denote any text content so it catches all the things. Disallow: /collections/%2B* - %2B is an encoded version of +, same as above.
Disallow: /collections/%2b - %2b is also an encoded version of for +, same as above.
Disallow: /blogs/+ - This blocks tags for blogs from being indexed, since they are not unique content (+ signs in handles are converted to - dashes in the URL).
Disallow: /blogs/%2B - Similar to above.
Disallow: /blogs/%2b - Similar to above.
Disallow: /design_theme_id - When editing a theme these URLs will be generated. Unnecessary.
Disallow: /preview_theme_id - Previewing an unpublished theme will include this in the initial URL. Not necessary.
Disallow: /preview_script_id - Used when canceling Script previews. Not needed otherwise.
Disallow: /gift_cards/ - Sent gift cards use this URL to display the code. Not needed otherwise.
Disallow: /policies/ - Where Refund, Privacy, and Terms of Service generated pages live. Not ârealâ content and will likely be very, very similar across stores (not something you want crawled or indexed).
Disallow: /search - Searches made on a storefront are not usually something that need to be crawled.
Disallow: /apple-app-site-association - Generates data for use with Apple, not needed.
Sitemap: https://SHOPDOMAIN/sitemap.xml - Defines where to find the sitemap file (this is not disallowed). Google adsbot ignores robots.txt unless specifically named! - A friendly comment to let you know Googleâs adsbot ignores robots.txt unless specifically named. All the following lines refer to Googleâs bot that crawls for data related to their ads.
User-agent: adsbot-google - The actual instruction ânamingâ the adsbot, so it knows to keep out.
Disallow: /checkout - Redirects to cart as above, not needed.
Disallow: /carts - Empty cart without a token as above, not needed.
Disallow: /orders - Same as above.
Disallow: /SHOPID/checkouts - Adsbot doesnât need to crawl abandoned checkouts.
Disallow: /SHOPID/orders - Adsbot doesnât need to look at order status pages.
Disallow: /gift_cards/ - Same as above.
Disallow: /design_theme_id - Same as above.
Disallow: /preview_theme_id - Same as above.
Disallow: /preview_script_id - Same as above.
User-agent: Nutch - This is a known web crawler. For more information, see http://nutch.apache.org/bot.html
Disallow: / - Nutch obeys robots.txt, and it is disallowed.
User-agent: MJ12bot - This is a web crawler for the Majestic business search engine.
Crawl-Delay: 10 - This asks the bot to wait 10 seconds between crawls, Mr. Bot. This instruction saves our bandwidth so the bot doesnât overwhelm storefronts.
User-agent: Pinterest - This refers to the Pinterest bot, looking for pins.
Crawl-delay: 1 - While it does have a rate limiter, we just want to make sure itâs not making more than 1 crawl per second.
i THINK, that shopify, do this to make the user handle to contact a specialist to fix the problem. So if Shopify not are change the behave, i will cancel my account next year. Because i choice Shopify, was that at small shop dit not have the ammount to pay for consultan to solve problem, that shopify automatic hav put into the file code, so that normal busines people has to contact professionel to fix the problem or rechange the code. I will never do so. So even i find the solution on WEB og AI Worktools or i will stop with Shopify solution.
I was looking for my User-agent file, that i can not find, because thay made it behind the closed system. I only want to change the crawl-delay from 10 to five, but i can not find it, and mayby it is bacause the want we to contact a specialist that have the right instruction to do this and then pay 1000 USD for 2 min work, no way.
I have found the same result in my store too, but I am not sure if this is making problem to my store because I started selling and suddenly stopped selling , if this cause this problem I need help too.
Using âDisallow: /searchâ in our robots.txt has resulted in Search Console flagging âissuesâ with our page appearance. Specifically, about 20K pages are âIndexed, though blocked by robots.txtâ. Hopefully adding ânoindexâ to our /search page will clear that up.