I'm highly upset with the way that Shopify apparently designed their "collections." For months, I'm been wondering why none of my primary collections (in lieu of non existing categories) have been ranking (at all) in the SERPs despite a decent showing by individual products, blogs, and pages.
Today I found this in Webmaster Tools:
This basically prevents Google from crawling any of your collections. This was done (I'm 99% positive) to keep sites on Shopify from having a massive amount of duplicate content due to Shopify's collection organization.
What this also means (I believe):
Collections cannot be crawled (and ranked in SERPs).
Any product that is linked externally (or even internally) to via a collection link will not receive link juice - i.e. [http://site.com/collections/collection-name/products/product] will not pass link juice. The only way to pass link juice to a product would be via the direct link - http://site.com/products/product. Who is going to link, organically, to a product that they've found via browsing through 10s or hundreds of products on a site vs finding it via a "collection" on the primary menu? No one.
IMHO, the correct way for the programmers to have worked this to keep duplicate content from being an issue would have been to rel="canonical" from the http://site.com/collections/collection-name/products/product to the http://site.com/products/product. ; This is one of the things rel="canonical" was designed for.
It appears that there is no robot.txt disallow for tags but what professional e-commerce site uses tags as a navigation feature? Tags are old news.
If what I've typed above is accurate, I've probably lost thousands of dollars in time and lost sales.
I cannot switch platforms at this time -- too much time, articles, and money (paid for multiple years) invested onto the Shopify system.
If I'm way off base here, someone please correct me. I have a intermediate knowledge of SEO so I'm not yet an expert.
I fail to understand why we don't have basic control over robots.txt via a simple interface such as Yoast's All in One SEO pack for Wordpress or even what is available via Studiopress's Genesis themes for Wordpress.
Solved! Go to the solution
This basically prevents Google from crawling any of your collections.
Are you sure about that? Can't say that I am up with the current standards for the robots.txt but to me it looks like that line is disallowing urls with a + in it -- not everything.
You may be right - I hope. I normally manage robots.txt externally via one of the SEO plugins for Wordpress. I much more familiar with .htaccess to block bad bots and redirect pages than I am with the internal workings of robots.txt. I took *+* to be a wildcard for any sub-folder but it may mean any text with a + figure in it (I seem to recall a few of those popping up for unknown reasons when fixing Google identified 404s).
In wordpress, it is sometime advantageous to rank your categories and it is sometimes better to disallow - depending on your theme and how you have your linking structure organized. I.E. a blog with full text posts should probably disallow crawling of the categories while one with only a few post excerpts showing on the home page or a static home page would likely do better with indexing of categories -- particularly if your theme and/or plugins allow for text to be added on the category page and the changing of excerpts to a unique description. I always think tags are a bad idea - no matter if you've disallowed them or not - there is no reason to create duplicate content whatsoever if you can avoid it.
It appears I jumped the gun and didn't research enough. Apparently the + resolves around the use of tags and keeping duplicate content from being indexed.
This is still bad news for anyone that uses tags (I don't) and expects organic links. If someone links to a product or blog post via a tagged (+) URL, it may not get the link juice - I'm not sure. I believe most themes have rel=canonical built into them, but if Google is told not index to access the page, it may refuse to account for the link juice going to the page.
It would be nice to have a category feature for at least the blog with option to index/no index in order to better organize it for readers. Tags should go the way of the dodo.
This is an accepted solution.
As of today, June 21st, 2021, we have launched the ability to edit the robot.txt file to give merchants more control over the information that is crawled by search engines. You can learn more about how to edit your robot.txt file through our community post here.
Due to the age of the topic, I will be locking this thread and marking it as resolved to help direct anyone that lands on this page. If you have any questions about the new feature, please do not hesitate to create a new post under our "Techincal QA" board.