Can someone explain these lines in my robots.txt?

Tourist
9 0 1

I understand some but not all of these can someone explained what the following lines are meant for?

 

User-agent: *
Disallow: /27142830/checkouts
Disallow: /27142830/orders (whats with this number 27142830? isnt the above "/orders" enough to block google bot?)
Disallow: /collections/*+* (I assume this id for noindexing tags?)
Disallow: /collections/*%2B*
Disallow: /collections/*%2b* (what is this %2b stuff? - whats this meant to noindex?)
Disallow: /blogs/*+* (I assume this is for noindex blog tag pages?)
Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b* (again what is this %2b stuff?)


 

Thanks for the help.

0 Likes
Shopify Staff
Shopify Staff
652 34 75

Hi there!

Don here from Shopify. :)

I can hopefully share some info here to clarify what's going on in your robots.txt file.

The Disallow: /orders and Disallow: /checkout entries here block any orders and checkouts made in your store as these are unique to each customer visit and do not need to be crawled for your store.

The numbers used here are referring to your store specifically and would be different for each Shopify store.

Disallow: /collections/+ does indeed refer to collections filtered with tags as these are not unique collections and as such don't need to be crawled.

The '+' used here is to catch any different language there might be attached to these filtered collections (* is a 'wildcard') and then the '%2B' is Unicode for +, so just fills the same role as the '+' above.

The same then is true for your blog tags, the '+' and '%2B' operators here are just there to exclude all possible versions of the tags not being crawled.

I trust that clears up the language you see there a little!

You can check out our guide to the robots.txt file here if you haven't already. :)

All the best!

Regards,
Don

Don | Social Care @ Shopify
 - Was my reply helpful? Click Like to let me know! 
 - Was your question answered? Mark it as an Accepted Solution
 - To learn more visit the Shopify Help Center or the Shopify Blog

0 Likes
New Member
2 0 0

Clarification on :

"Disallow: /collections/+ does indeed refer to collections filtered with tags as these are not unique collections and as such don't need to be crawled."

-----------------------------

This is a good answer but taken in the context of a live store means something different 

Google Says The Following:

----------------------------

A robotted page can still be indexed if linked to from from other sites...


You should not use robots.txt as a means to hide your web pages from Google Search results. This is because, if other pages point to your page with descriptive text, your page could still be indexed without visiting the page. If you want to block your page from search results, use another method such as password protection or a noindex directive.

 

So in my mind all this does is tell Google not to index the page as part of its site crawling, but that it will still index all these filtered collections once  Googlebot sees links with descriptive text from other sites which will happen once your site gains popularity.

 

This is what Google does in its index in these cases

 

The key take away here is that google never actually crawls these pages.  It still includes links to them in its index as it found them on other sites.  So on Shopify you could not actually remove all these pages from a google index, because you would have to first allow google to crawl them and then use a no-index directive.  Not a big issue, but worth knowing any way.  The best workaround could be a clever javascript redirect if a link is causing issues to your brand. 

https://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157

0 Likes