Impulse theme filters generates thousands of indexable url

Solved
MarcoMatty
Shopify Expert
16 1 5

Hello All
We are haing an isseu with Impulse, as we arealise that evry tyme a filter is select the theme change url addin the name of the filter 

For ecample if we are inside the Shirt collection 
domain/collections/shirts
and we press on a filter , example ( colour Red ) 
the links become 
domain/collections/shirts/red

this for us us a problem as all of those links are indexed from Google and generates thousands of URL

There is a way to solve the issuue?

We were thinking about adding a no index but we don't know how we can apply it only ot the filtered URL

Accepted Solution (1)
PageFly-Victor
Shopify Partner
7865 1785 3017

This is an accepted solution.

Hi @MarcoMatty

Which tool did you use to check if these URLs are indexable, normally URLs created from filters are already disallowed in robots.txt file. 

Screenshot (43).png

However, if you want to de-index all filtered URLs, you can edit your robots.txt file, simply add another Disallow protocol for pages having url like domain/collection/shirt/variant

 

Let me know if my answer is of any help with a thumb up or like. Cheers 

----

This May 19-20th, we are having C-level E-commerce experts from #Shopify, Recart, vitals, The Dropshipping Council, Easyship and ZAGO to join us in the extraordinary conference: "eCommence 2021 - To Survive Is To Thrive". 

In this event these real experts will guide you from A to Z how to build a successful store and provide tips regarding the newest trend in E-Commerce! This is a never-before opportunity where you can get lessons from the best minds in the industry without paying any fee (yes, the treat on us 😉 ). Nothing to lose, just gaining from the best minds in the game. Sign up now cause we only take the first 500 registers! SIGN UP HERE

banned

View solution in original post

Replies 3 (3)
PageFly-Victor
Shopify Partner
7865 1785 3017

This is an accepted solution.

Hi @MarcoMatty

Which tool did you use to check if these URLs are indexable, normally URLs created from filters are already disallowed in robots.txt file. 

Screenshot (43).png

However, if you want to de-index all filtered URLs, you can edit your robots.txt file, simply add another Disallow protocol for pages having url like domain/collection/shirt/variant

 

Let me know if my answer is of any help with a thumb up or like. Cheers 

----

This May 19-20th, we are having C-level E-commerce experts from #Shopify, Recart, vitals, The Dropshipping Council, Easyship and ZAGO to join us in the extraordinary conference: "eCommence 2021 - To Survive Is To Thrive". 

In this event these real experts will guide you from A to Z how to build a successful store and provide tips regarding the newest trend in E-Commerce! This is a never-before opportunity where you can get lessons from the best minds in the industry without paying any fee (yes, the treat on us 😉 ). Nothing to lose, just gaining from the best minds in the game. Sign up now cause we only take the first 500 registers! SIGN UP HERE

banned
amandao1
Tourist
7 0 2

This doesn't really solve the problem since the googlebot still needs to "check" every URL if it's allowed or not. Tried crawling our page after filters were added and 200 000 URLS where found and that was only about 1% of the page. I had to remove the filers again to be able to crawl the whole page. 
Is there a way to actually no-index there pages, not only disallow them? Another webshop I worked for had a setup with a hashtag, example: www.webpage.com/collection#green-xl
They set up a rule telling google not to read anything after the hashtag. Is it possible to setup something like this? 

As mentined, only disallowing the pages, google will not be able to crawl the whole website since they use uneccesari crawl capacity to check all disallwed pages only to discover, it's not allowed to read the page. 



KieranR
Shopify Partner
333 27 115

Is there a way to actually no-index there pages, not only disallow them? 

You can write some liquid if '+' or '%2b%' exist in page.url add noindex tag logic on this condition. But I'm not sure I agree this is necessary - blocking via robots.txt disallow should drop crawling off. Yes the pages can still be indexed if they have sufficient backlinks or internal links but you should really be choosing either noindex OR robots.txt disallow. You could throw in some rel="nofollow" on the <a> links if neccessary to prevent Google wasting CB if that is actually a problem (usually not).

 

They set up a rule telling google not to read anything after the hashtag. Is it possible to setup something like this? 

I could imagine some custom JS fragment based routing, though these pages are not served by Shopify on HTTP200 itself, perhaps they were using another platform? Sounds the same as the '+' based Disallow logic to me anyway so not sure what additional benefit this offers. 

 

google will not be able to crawl the whole website since they use uneccesari crawl capacity to check all disallwed pages only to discover

This doesn't seem right. Can you show an example in GSC screenshot of disallowed URLs having been crawled?

 

Full time Shopify SEO guy, based in NZ. Sometimes freelance outside the 9-5.