I understand some but not all of these can someone explained what the following lines are meant for?
User-agent: * Disallow: /27142830/checkouts Disallow: /27142830/orders (whats with this number 27142830? isnt the above "/orders" enough to block google bot?) Disallow: /collections/*+* (I assume this id for noindexing tags?) Disallow: /collections/*%2B* Disallow: /collections/*%2b* (what is this %2b stuff? - whats this meant to noindex?) Disallow: /blogs/*+* (I assume this is for noindex blog tag pages?) Disallow: /blogs/*%2B* Disallow: /blogs/*%2b* (again what is this %2b stuff?)
Thanks for the help.
Don here from Shopify. :)
I can hopefully share some info here to clarify what's going on in your robots.txt file.
The Disallow: /orders and Disallow: /checkout entries here block any orders and checkouts made in your store as these are unique to each customer visit and do not need to be crawled for your store.
The numbers used here are referring to your store specifically and would be different for each Shopify store.
Disallow: /collections/+ does indeed refer to collections filtered with tags as these are not unique collections and as such don't need to be crawled.
The '+' used here is to catch any different language there might be attached to these filtered collections (* is a 'wildcard') and then the '%2B' is Unicode for +, so just fills the same role as the '+' above.
The same then is true for your blog tags, the '+' and '%2B' operators here are just there to exclude all possible versions of the tags not being crawled.
I trust that clears up the language you see there a little!
You can check out our guide to the robots.txt file here if you haven't already. :)
All the best!
Clarification on :
"Disallow: /collections/+ does indeed refer to collections filtered with tags as these are not unique collections and as such don't need to be crawled."
This is a good answer but taken in the context of a live store means something different
A robotted page can still be indexed if linked to from from other sites...
You should not use robots.txt as a means to hide your web pages from Google Search results. This is because, if other pages point to your page with descriptive text, your page could still be indexed without visiting the page. If you want to block your page from search results, use another method such as password protection or a noindex directive.
So in my mind all this does is tell Google not to index the page as part of its site crawling, but that it will still index all these filtered collections once Googlebot sees links with descriptive text from other sites which will happen once your site gains popularity.