SEO, AdWords, affiliates, advertising, and promotions
I am worried about the robots.txt file for my client website which you can see here:
is this blocking the blogs and collections pages from being indexed by the search engines?
Solved! Go to the solution
This is an accepted solution.
@Eavesy, robots.txt file is not blocking all collections and all blogs pages from being indexed on that site.
The screenshot given is a stock standard Shopify robots.txt file. So any blog and collection URLs, if they contain a "+" (plus) character, will be prevented from crawling. The strings "%2B" and "%2b" are just a URL encoded '+' symbol, so mean the same thing.
While not officially documented by Shopify, I would assume that the reason Shopify has added this config is to minimize their own server resource usage (at scale). It'll also help a bit with unnecessary Googlebot crawl-budget consumption and infinite crawler traps. Most of the time (but not all the time) those faceted nav are not very useful to searchers, so it's usually a good thing that they are not crawled/indexed anyway.
Here's an example of paths that would be allowed vs. blocked with the default Shopify robots.txt:
This doesn't seem to be an issue at all for your site, because tag faceted/filtered links are not even being used internally on collections or blogs.
I'm not sure @SEO_Booster & @Propelguru are saying ¯\_(ツ)_/¯
Hi Eavesy,
Yep. Your blogs and collections pages are prevented from crawling.
is there anyway to fix this guys? I can pay if anyone thinks they can sort it
Search engine spiders will crawl your full website to temporarily store your site pages for indexing. Generally speaking, most website owners are happy that search engines crawl and index any page they want; However, there are cases where you don't want the pages to be indexed.
For example, if you are developing a new website, it is generally best to prevent search engines from indexing your website so that the incomplete webpage does not appear in search engines and sometimes website owners Stop search engines from indexing specific pages is necessary from time to time because website owners don't want every page to index due to many reasons, and yes robots.txt is blocking your blogs and pages from indexing in google search results.
This is an accepted solution.
@Eavesy, robots.txt file is not blocking all collections and all blogs pages from being indexed on that site.
The screenshot given is a stock standard Shopify robots.txt file. So any blog and collection URLs, if they contain a "+" (plus) character, will be prevented from crawling. The strings "%2B" and "%2b" are just a URL encoded '+' symbol, so mean the same thing.
While not officially documented by Shopify, I would assume that the reason Shopify has added this config is to minimize their own server resource usage (at scale). It'll also help a bit with unnecessary Googlebot crawl-budget consumption and infinite crawler traps. Most of the time (but not all the time) those faceted nav are not very useful to searchers, so it's usually a good thing that they are not crawled/indexed anyway.
Here's an example of paths that would be allowed vs. blocked with the default Shopify robots.txt:
This doesn't seem to be an issue at all for your site, because tag faceted/filtered links are not even being used internally on collections or blogs.
I'm not sure @SEO_Booster & @Propelguru are saying ¯\_(ツ)_/¯
It's not so much about resources, but the above is correct on there being no block to the collections or blogs. The others posting here are just wrong.
When the filtered by tag urls are used there's a chance that the tags could be shown in different orders (eg something+else or else+something) but still return and show the exact some content. There's an advantage to not indexing those to reduce duplicate content risks. It doesn't stop your main collection being indexed.
Hello @KieranR,
Thank you for your answer! I am facing the same issue as the author of the question and I was advised to remove this from Robots.txt (screenshot attached):
Disallow: /blogs/*+*
Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b*
Disallow: /*/blogs/*+*
Disallow: /*/blogs/*%2B*
Disallow: /*/blogs/*%2b*
However, after reading your reply and after checking pages with Google Search Console robots.txt Tester I see that it's not the case why our articles have not yet been indexed by Google. Any ideas what else it can be then? All our articles (from last and this year) are not indexed, however it says that "URL can be indexed". Here are our latest articles:
• https://exploroproducts.com/blogs/blog/everything-you-should-know-about-pre-employment-drug-testing
• https://exploroproducts.com/blogs/blog/how-to-pass-a-drug-test-in-2021
Thanks for any advice!
Taken a while to respond, but they're indexed now. Often it's just a few simple reasons:
In Shopify, 99% of the time it's reason 1 or 2.
@KieranR wrote:Here's an example of paths that would be allowed vs. blocked with the default Shopify robots.txt:
- Allowed: /collections/premium-roller-banner/upload
- Blocked: /collections/premium-roller-banner/upload+budget
Note here that removing such rules would crawl /upload+budget and /budget+upload with the same duplicated content.
Not sure of any canonical behavior across themes for that.
Contact paull.newton+shopifyforum@gmail.com for the solutions you need
Save time & money ,Ask Questions The Smart Way
Problem Solved? ✔Accept and Like solutions to help future merchants
Answers powered by coffee Thank Paul with a ☕ Coffee for more answers or donate to eff.org
Thanks guys, will stop worrying now
As 2024 wraps up, the dropshipping landscape is already shifting towards 2025's trends....
By JasonH Nov 27, 2024Hey Community! It’s time to share some appreciation and celebrate what we have accomplis...
By JasonH Nov 14, 2024In today’s interview, we sat down with @BSS-Commerce to discuss practical strategies f...
By JasonH Nov 13, 2024