Solved

Robots.txt question please help

Eavesy
Tourist
5 0 1

I am worried about the robots.txt file for my client website which you can see here:

is this blocking the blogs and collections pages from being indexed by the search engines?

Accepted Solution (1)

KieranR
Shopify Partner
333 27 115

This is an accepted solution.

@Eavesy, robots.txt file is not blocking all collections and all blogs pages from being indexed on that site.

The screenshot given is a stock standard Shopify robots.txt file. So any blog and collection URLs, if they contain a "+" (plus) character, will be prevented from crawling. The strings "%2B" and "%2b" are just a URL encoded '+' symbol, so mean the same thing.

While not officially documented by Shopify, I would assume that the reason Shopify has added this config is to minimize their own server resource usage (at scale). It'll also help a bit with unnecessary Googlebot crawl-budget consumption and infinite crawler traps. Most of the time (but not all the time) those faceted nav are not very useful to searchers, so it's usually a good thing that they are not crawled/indexed anyway.

Here's an example of paths that would be allowed vs. blocked with the default Shopify robots.txt: 

  • Allowed: /collections/premium-roller-banner/upload
  • Blocked: /collections/premium-roller-banner/upload+budget

This doesn't seem to be an issue at all for your site, because tag faceted/filtered links are not even being used internally on collections or blogs.

I'm not sure @SEO_Booster & @Propelguru are saying ¯\_(ツ)_/¯

Full time Shopify SEO guy, based in NZ. Sometimes freelance outside the 9-5.

View solution in original post

Replies 9 (9)

SEO_Booster
Shopify Partner
42 2 10

Hi Eavesy,

Yep. Your blogs and collections pages are prevented from crawling. 

 

banned
Eavesy
Tourist
5 0 1

is there anyway to fix this guys? I can pay if anyone thinks they can sort it

Propelguru
Trailblazer
313 7 44

Search engine spiders will crawl your full website to temporarily store your site pages for indexing. Generally speaking, most website owners are happy that search engines crawl and index any page they want; However, there are cases where you don't want the pages to be indexed.

For example, if you are developing a new website, it is generally best to prevent search engines from indexing your website so that the incomplete webpage does not appear in search engines and sometimes website owners Stop search engines from indexing specific pages is necessary from time to time because website owners don't want every page to index due to many reasons, and yes robots.txt is blocking your blogs and pages from indexing in google search results.

KieranR
Shopify Partner
333 27 115

This is an accepted solution.

@Eavesy, robots.txt file is not blocking all collections and all blogs pages from being indexed on that site.

The screenshot given is a stock standard Shopify robots.txt file. So any blog and collection URLs, if they contain a "+" (plus) character, will be prevented from crawling. The strings "%2B" and "%2b" are just a URL encoded '+' symbol, so mean the same thing.

While not officially documented by Shopify, I would assume that the reason Shopify has added this config is to minimize their own server resource usage (at scale). It'll also help a bit with unnecessary Googlebot crawl-budget consumption and infinite crawler traps. Most of the time (but not all the time) those faceted nav are not very useful to searchers, so it's usually a good thing that they are not crawled/indexed anyway.

Here's an example of paths that would be allowed vs. blocked with the default Shopify robots.txt: 

  • Allowed: /collections/premium-roller-banner/upload
  • Blocked: /collections/premium-roller-banner/upload+budget

This doesn't seem to be an issue at all for your site, because tag faceted/filtered links are not even being used internally on collections or blogs.

I'm not sure @SEO_Booster & @Propelguru are saying ¯\_(ツ)_/¯

Full time Shopify SEO guy, based in NZ. Sometimes freelance outside the 9-5.
Jason
Shopify Expert
11190 225 2282

It's not so much about resources, but the above is correct on there being no block to the collections or blogs. The others posting here are just wrong.

When the filtered by tag urls are used there's a chance that the tags could be shown in different orders (eg something+else or else+something) but still return and show the exact some content. There's an advantage to not indexing those to reduce duplicate content risks. It doesn't stop your main collection being indexed.

★ I jump on these forums in my free time to help and share some insights. Not looking to be hired, and not looking for work. http://freakdesign.com.au ★
Irena
Pathfinder
127 1 30

Hello @KieranR,

Thank you for your answer! I am facing the same issue as the author of the question and I was advised to remove this from Robots.txt (screenshot attached):

Disallow: /blogs/*+*

Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b*
Disallow: /*/blogs/*+*
Disallow: /*/blogs/*%2B*
Disallow: /*/blogs/*%2b*

However, after reading your reply and after checking pages with Google Search Console robots.txt Tester I see that it's not the case why our articles have not yet been indexed by Google. Any ideas what else it can be then? All our articles (from last and this year) are not indexed, however it says that "URL can be indexed". Here are our latest articles:

• https://exploroproducts.com/blogs/blog/everything-you-should-know-about-pre-employment-drug-testing
https://exploroproducts.com/blogs/blog/how-to-pass-a-drug-test-in-2021 

Thanks for any advice!

Robots txtRobots txt

KieranR
Shopify Partner
333 27 115

Taken a while to respond, but they're indexed now. Often it's just a few simple reasons: 

 

    1. Google takes a while to crawl & index new pages on an existing site
    2. If you have no domain authority, are a new website or have no reason for google to trust your site, then in my experience they can take a LONG time to index pages from weeeks to a couple months. 
    3. There may be technical reasons like robots blocking, no internal link/crawl path to URL, not in sitemap, JS issues etc.

 

In Shopify, 99% of the time it's reason 1 or 2. 

Full time Shopify SEO guy, based in NZ. Sometimes freelance outside the 9-5.
PaulNewton
Shopify Partner
6274 573 1319

@KieranR wrote:

Here's an example of paths that would be allowed vs. blocked with the default Shopify robots.txt: 

  • Allowed: /collections/premium-roller-banner/upload
  • Blocked: /collections/premium-roller-banner/upload+budget

Note here that removing such rules would crawl /upload+budget  and /budget+upload with the same duplicated content.

Not sure of any canonical behavior across themes for that.

Save time & money ,Ask Questions The Smart Way


Confused? Busy? Get the solution you need paull.newton+shopifyforum@gmail.com


Problem Solved? ✔Accept and Like solutions to help future merchants

Answers powered by coffee Buy Paul a Coffee for more answers or donate to eff.org


Eavesy
Tourist
5 0 1

Thanks guys, will stop worrying now