Solved

Robots.txt problems with disallow

orangegrove
Excursionist
18 3 0

Hi all, I cant figure out why my robots file has all these disallows on it, can anyone help?  I dont have funds for a shopify expert.

My site orangegrove.conz is not showing up on Google so I had a look around after ContentKing brought it to my attention. 

I am using Minimal theme, there seem to be quite a few things with the free themes.  eg Scss files and the message that support is deprecated in themes and to convert the files.  But thats another story.

 

Thx Wendy

 

 

# we use Shopify as our ecommerce platform

User-agent: *
Disallow: /a/downloads/-/*
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /27443134534/checkouts
Disallow: /27443134534/orders
Disallow: /carts
Disallow: /account
Disallow: /collections/*sort_by*
Disallow: /*/collections/*sort_by*
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /*/collections/*+*
Disallow: /*/collections/*%2B*
Disallow: /*/collections/*%2b*
Disallow: /blogs/*+*
Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b*
Disallow: /*/blogs/*+*
Disallow: /*/blogs/*%2B*
Disallow: /*/blogs/*%2b*
Disallow: /*?*oseid=*
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*
Disallow: /policies/
Disallow: /*/*?*ls=*&ls=*
Disallow: /*/*?*ls%3D*%3Fls%3D*
Disallow: /*/*?*ls%3d*%3fls%3d*
Disallow: /search
Disallow: /apple-app-site-association
Sitemap: https://orangegrove.co.nz/sitemap.xml

User-agent: adsbot-google
Disallow: /checkout
Disallow: /carts
Disallow: /orders
Disallow: /27443134534/checkouts
Disallow: /27443134534/orders
Disallow: /*?*oseid=*
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*

User-agent: Nutch
Disallow: /

User-agent: AhrefsBot
Crawl-delay: 10
Disallow: /a/downloads/-/*
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /27443134534/checkouts
Disallow: /27443134534/orders
Disallow: /carts
Disallow: /account
Disallow: /collections/*sort_by*
Disallow: /*/collections/*sort_by*
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /*/collections/*+*
Disallow: /*/collections/*%2B*
Disallow: /*/collections/*%2b*
Disallow: /blogs/*+*
Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b*
Disallow: /*/blogs/*+*
Disallow: /*/blogs/*%2B*
Disallow: /*/blogs/*%2b*
Disallow: /*?*oseid=*
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*
Disallow: /policies/
Disallow: /*/*?*ls=*&ls=*
Disallow: /*/*?*ls%3D*%3Fls%3D*
Disallow: /*/*?*ls%3d*%3fls%3d*
Disallow: /search
Disallow: /apple-app-site-association
Sitemap: https://orangegrove.co.nz/sitemap.xml

User-agent: AhrefsSiteAudit
Crawl-delay: 10
Disallow: /a/downloads/-/*
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /27443134534/checkouts
Disallow: /27443134534/orders
Disallow: /carts
Disallow: /account
Disallow: /collections/*sort_by*
Disallow: /*/collections/*sort_by*
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /*/collections/*+*
Disallow: /*/collections/*%2B*
Disallow: /*/collections/*%2b*
Disallow: /blogs/*+*
Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b*
Disallow: /*/blogs/*+*
Disallow: /*/blogs/*%2B*
Disallow: /*/blogs/*%2b*
Disallow: /*?*oseid=*
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*
Disallow: /policies/
Disallow: /*/*?*ls=*&ls=*
Disallow: /*/*?*ls%3D*%3Fls%3D*
Disallow: /*/*?*ls%3d*%3fls%3d*
Disallow: /search
Disallow: /apple-app-site-association
Sitemap: https://orangegrove.co.nz/sitemap.xml

User-agent: MJ12bot
Crawl-delay: 10

User-agent: Pinterest
Crawl-delay: 1

 

Accepted Solution (1)
orangegrove
Excursionist
18 3 0

This is an accepted solution.

Thank you very much Drakedev ,appreciate the clarity, gives me peace of mind.  Wendy

View solution in original post

Replies 10 (10)

drakedev
Shopify Partner
685 148 229

Hi,

this is the default robots.txt configuration for Shopify stores. It works well and I shouldn't worry about it at the moment, unless you have a very specific need.

Regarding the deprecation of SCSS I also shouldn't worry for now. Shopify now "suggest" to avoid using SCSS for new themes, but SCSS is still supported and it will be for long time.

If my answer was helpful click Like to say thanks
If the problem is solved remember to click Accept Solution
Shopify/Shopify Plus custom development: You can hire me for simple and/or complex tasks.
orangegrove
Excursionist
18 3 0

This is an accepted solution.

Thank you very much Drakedev ,appreciate the clarity, gives me peace of mind.  Wendy

Shazad
New Member
10 0 0

Hi,

I was also worried about the same disallow thing in the Sitemap but after reading this thread I am bit relaxed.

However, it's been 3 weeks since I launch my website and the pages are not yet indexed in google.

When I go to google console I can see 78 Undiscovered URLs. I then requested re-indexing of these URLs using URL inspection tool within Search console but no effect after 5 days.

Can you anyone help me with how can I get all my URLs indexed.

Thank you in advance!

drakedev
Shopify Partner
685 148 229

Hello,

I see 468 urls indexed by Google at least.

2021-09-19 16_20_32-site_orangegrove.co.nz - Google Search.png

If you have more you need a little bit of more patient.

If my answer was helpful click Like to say thanks
If the problem is solved remember to click Accept Solution
Shopify/Shopify Plus custom development: You can hire me for simple and/or complex tasks.
wendycain
Explorer
81 7 15

Thanks but it was Shazad with that post tagged onto mine.  Can you help him?  Shazad you need to provide your url.

tinadochana
Tourist
4 0 1

Hi

I am actually having the same issue! Google search Console shos only 4 pages have crawled and the rest are 'Crawled but not indexed'? 

I checked to see how many pages are actually indexed using site:newagewalls.com and it shows 3 pages. My robots.txt file looks the same as above~

Can someone help?

 

GianParodi
Visitor
2 0 0

I also, have submitted my site to be indexed.

When I do the site:benedante.com

it tells me to try google console, without saying it has found any pages at all.

Does the disallow mean that it won't find any pages to index?

tinadochana
Tourist
4 0 1

I reached out to Shopify about this and they gave me a line-by-line explanation. Looks like the disallow is there for valid reasons and in my case, it looks like some schema information is missing on my website. I figured this out by running my missing pages through https://search.google.com/test/rich-results . I have had to hire someone from SEO-JSON-LD,Schema by Webrex Studio to fix this for me as it's a lengthy process. I will update here if they are able to fix my problems successfully for me!

Shopify's reply:

Here is a line by line explanation for you:
we use Shopify as our ecommerce platform - This is just a comment on the file, it doesn't do anything.
User-agent: * - These rules are set for any visitor, and all bots should follow them (though a bot may not; its not mandatory).
Disallow: /admin - This keeps admin pages from being crawled, because no bot would be able to view that page, and admin pages do not need to be indexed or ranked by search engine crawler bots.
Disallow: /cart - Cart pages without items in them don't need to be crawled, indexed, or ranked.
Disallow: /orders - Doesn’t need to be crawled, indexed, or ranked.
Disallow: /checkout - Doesn’t need to be crawled, indexed, or ranked.
Disallow: /SHOPID/checkouts - An area for recovering checkouts. Doesn’t need to be crawled, indexed, or ranked.
Disallow: /SHOPID/orders - Order status pages. Don't need to be crawled/indexed/ranked.
Disallow: /carts - Same as above.
Disallow: /account - Customer accounts. These don't need to be crawled, and bots would usually not have customer accounts anyway.
Disallow: /collections/+ - Any collection with tag filtering will use + signs to link tags, and since that’s not an actual collection in itself we don’t want that crawled like an actual collection. The asterisks () denote any text content so it catches all the things.
Disallow: /collections/
%2B* - %2B is an encoded version of +, same as above.
Disallow: /collections/%2b - %2b is also an encoded version of for +, same as above.
Disallow: /blogs/+ - This blocks tags for blogs from being indexed, since they are not unique content (+ signs in handles are converted to - dashes in the URL).
Disallow: /blogs/%2B - Similar to above.
Disallow: /blogs/%2b - Similar to above.
Disallow: /design_theme_id - When editing a theme these URLs will be generated. Unnecessary.
Disallow: /preview_theme_id - Previewing an unpublished theme will include this in the initial URL. Not necessary.
Disallow: /preview_script_id - Used when canceling Script previews. Not needed otherwise.
Disallow: /gift_cards/ - Sent gift cards use this URL to display the code. Not needed otherwise.
Disallow: /policies/ - Where Refund, Privacy, and Terms of Service generated pages live. Not “real” content and will likely be very, very similar across stores (not something you want crawled or indexed).
Disallow: /search - Searches made on a storefront are not usually something that need to be crawled.
Disallow: /apple-app-site-association - Generates data for use with Apple, not needed.
Sitemap: https://SHOPDOMAIN/sitemap.xml - Defines where to find the sitemap file (this is not disallowed).
Google adsbot ignores robots.txt unless specifically named! - A friendly comment to let you know Google’s adsbot ignores robots.txt unless specifically named. All the following lines refer to Google's bot that crawls for data related to their ads.
User-agent: adsbot-google - The actual instruction “naming” the adsbot, so it knows to keep out.
Disallow: /checkout - Redirects to cart as above, not needed.
Disallow: /carts - Empty cart without a token as above, not needed.
Disallow: /orders - Same as above.
Disallow: /SHOPID/checkouts - Adsbot doesn’t need to crawl abandoned checkouts.
Disallow: /SHOPID/orders - Adsbot doesn’t need to look at order status pages.
Disallow: /gift_cards/ - Same as above.
Disallow: /design_theme_id - Same as above.
Disallow: /preview_theme_id - Same as above.
Disallow: /preview_script_id - Same as above.
User-agent: Nutch - This is a known web crawler. For more information, see http://nutch.apache.org/bot.html
Disallow: / - Nutch obeys robots.txt, and it is disallowed.
User-agent: MJ12bot - This is a web crawler for the Majestic business search engine.
Crawl-Delay: 10 - This asks the bot to wait 10 seconds between crawls, Mr. Bot. This instruction saves our bandwidth so the bot doesn't overwhelm storefronts.
User-agent: Pinterest - This refers to the Pinterest bot, looking for pins.
Crawl-delay: 1 - While it does have a rate limiter, we just want to make sure it’s not making more than 1 crawl per second.

GianParodi
Visitor
2 0 0

Thank you Tina, appreciate your reply.

wendycain
Explorer
81 7 15

Thank you so much