How can I stop unwanted URLs from being indexed in Google?

theDMM · March 15, 2024, 4:18pm

Hello,
I noticed many webpages starting with “pr_prod_strat=collection_fallback” and ending with “&pr_seq=uniform” are appearing on my google search consol to be indexed.

Here is an example string that is at the end of my product URLs:

?pr_prod_strat=use_description&pr_rec_id=f50a47009&pr_rec_pid=4703513280571&pr_ref_pid=4703091556411&pr_seq=uniform

This created nearly 80,000 new pages being discovered by Google, likely overwhelmed Google’s indexing system - Though these pages aren’t indexed (shown in the screenshot below), the sudden increase in ‘worthless’ URLs seems to have negatively impacted our site’s rankings. Google dislikes the effort of discovering pages it won’t ultimately index.

Can someone explain where these are coming from? And how to prevent them?
Will my method below work to properly noindex/nofollow pages like these?
Will adding the “” code in the head of any webpage prevent it from being indexed and followed?

I was thinking of adding the codes below (placement shown in the screenshots), will this solve the issue?:

{% if handle contains ‘seq=uniform’ %}

{% endif %}

{%- if canonical_url=“*seq=uniform” -%}

{%- endif -%}

And adding the code below to my robots.txt file:
{%- if group.user_agent.value == ‘’ -%}
{{ 'Disallow: /products/?pr_prod_strat=*’ }}
{%- endif -%}

tim_1 · March 16, 2024, 6:12am

These are URLs from Recomended/Complimentary products.

Using these links allows Shopify to collect some data on your visitors behaviour and make better recommendations.

It’s possible to modify your theme to not output these links, say look in your card-product.liquid snippet (or similar) and replace

{{ card_product.url }}

with

{{ card_product.url | split: "?" | first }}

This is the easiest, but you’re loosing “intelligence” in product recommendations.

The following should not work, because

a) neither handle nor canonical_url include any of these parameters

b) SE will not like having and URL and being unable to index it – ie those URL will still be in the list of your URL and consume your crawl budget.

{% if handle contains 'seq=uniform' %}

{% endif %}

{%- if canonical_url="*seq=uniform" -%}

{%- endif -%}

A bit better option is to modify robots.txt, however b) from above will still apply and. Have not tried this myself recently.

I thought the best option is to add rel=“nofollow” to these links (and then also disallow in robots), but then I’ve found out that rel=nofollow is now considered only a hint rather then command by Google now.

Relevant info here: https://www.sammyseo.com/death-of-internal-nofollow-use-js-to-stop-google-crawling/

I guess it’s actually possible to implement the JS “onclick” thing, but it’s theme-dependant.

Like instead of

...

output something like

...

theDMM · March 20, 2024, 6:56pm

@tim_1 Hi Tim, thanks for your detailed response. Based on the above information you provided, what would you recommend is the best way to handle this situation? We would like to keep the recommendations module functioning, but I am afraid Google finding but being unable to index the pages may affect our Google rankings as well.

tim_1 · March 21, 2024, 1:44am

I’d say:

Implement disallow in robots.txt;
Implement card-product modification based on the code provided above;

Wait for Google to expire the URLs they’ve already scavenged

theDMM · March 22, 2024, 6:10pm

@tim_1 Thanks for your response. By adding the code above, (1) will the product recommendation module lose its’ learnings and ability to recommend products to users? And (2) will this prevent UTMs from being tracked, or is that not the case since UTMs are external from these URLs being generated?

In the case that the product module loses its’ learnings, are there any other methods to help with this situation?

Thanks again for your help Tim!

tim_1 · March 23, 2024, 2:59am

No. The code above splits original suggested link for product page into two parts:

one is canonical URL without any parameters and it goes inside the <a href=“…” so crawler bot sees this canonical URL only.

second part, which is those tracking parameters – is stored in the data-tail attribute.

When visitor clicks the link, onclick JS code combines these two parts and makes browser to go to this original ULR complete with recommendations tracking data.

When visitor arrives on that page, Shopify can learn (based on these URL params) which one of the recommended products was most relevant to this visitor.

Say, original URL from Shopify recommendation engine is

/products/XXX?pr_prod_strat=use_description&pr_rec_id=f50a47009...

instead of the current


the code will output

```markup

Crawler bot will see the link to /products/XXX,

but after clicking the link visitors will arrive where Shopify has intended them to go.

Marine6 · May 15, 2024, 2:13pm

Hello,

I’m here to give feedback on this topic and the changes made.

I do not understand how Shopify can still leave in 2024 such problems when we know that it wants to be the leaders in Ecommerce and that today Google is increasingly attentive to these exploration budgets.

I made the modifications around April 10, 2024.

The modifications include:

The change of “{{ card_product.url }}” To “{{ card_product.url | split: “?” | first }}”

The modification of Robots.txt :
Disallow: ?pr_prod_strat=pr_seq=uniform
Disallow: /?q=’
Disallow: *.atom
Disallow: ?variant=

I’ve given google time to take these changes into account.

Today, I can report that these damn urls are still crawled by Google.

@tim_1 , would it be possible to get in touch? I’d like to call on your services to try and find a solution for my site.

Thanks for your help

tim_1 · May 15, 2024, 2:37pm

Yep, sure you can direct mail me via forum or e-mail at tairli@yahoo.com

Marine6 · May 23, 2024, 8:33am

Hello @tim_1 ,

I contacted you a week ago now on your email address and I’m currently without any feedback from you.

Did you receive my e-mail?

Thanks
Marine

Rapture · May 27, 2024, 2:01pm

Hey Marine, did you manage to find a solution ? I found couple of articles about it, but not sure if people reported back if they were successful or not. Thanks for your response

Marine6 · June 6, 2024, 10:04am

Hello @Rapture ,

No viable solution for the moment and for years.

Whether blocking by robots.txt and applying a small modification in the code, nothing will change, urls will always be crawled by google.

The only two solutions would be for @Shopify_77 to finally take this problem seriously and rule it out at the source of the code.

Otherwise, any e-merchant who wants to develop his store in seo will have to migrate the store from Shopify and see what the competitors are doing.

lariushub · June 25, 2024, 9:06am

Hi Connor,

I saw your problem and I can tell you that it happened to us too with a client… Unfortunately all the solutions I see below and find in other forums are not correct.. tested several times but without results.

They simply “block” the problem but do not solve it.

In our specific case the pages were also indexed, arrived to have over 200K indexed pages which drastically penalized the ranking.

We have now managed to reverse the range as per the attached file in the search console, with the goal of bringing the indexed pages to the actual ones of about 5K.

Unfortunately this Shopify problem is really something they should fix, because even talking to their support they don’t really know how they can avoid these “SEO disasters” that inexperienced users rightly don’t even realize they have.

We really tried them all, involving other SEO experts to be able to come up with a solution to de-index all the pages that are not needed and waste Crawl Budget, and then not to undo the efforts and SEO activities done to improve ranking.

We are finally coming out of this, if you still haven’t solved the problem and want to explore further let me know!

See you soon,

Mirko

SaltyFrance · July 3, 2024, 8:42am

Hello Mirko and everyone

As you report,

I’m currently facing an issue with Google indexing multiple product URLs on our site feelsalty.com, despite the directives specified in our robots.txt file and the correct use of canonical tags.

We have properly configured our robots.txt file to exclude the following URLs:

Disallow: /products/*?pr_prod_strat=collection_fallback
Proof here: https://feelsalty.com/robots.txt

Despite this, Google continues to index over 300 incorrect URLs that include the parameter pr_prod_strat=collection_fallback. The issue appeared suddenly, and I can’t find any explanation. We have also used canonical tags for each page to specify the correct canonical URL.

Could you please provide guidance on how to resolve this issue? Is there another method we should consider to prevent these URLs from being indexed by search engines?

Tenpinshop · July 8, 2024, 9:54am

@lariushub - would you be able to share the method you used? I’ve previously been advised not to change the shopify robots.txt file as it wouldn’t be receive futur updates, but we’ve over 300,000 of these and I can’t imagine it’s doing our website crawl much good!

jamnas · October 25, 2024, 1:55pm

Hi lariushub,

What solution did you use to o de-index all the pages that are not needed at the end?

theDMM · October 25, 2024, 4:43pm

Hello & @SaltyFrance I looked at your robots.txt file and it appears you may have removed the “pr_prod_strat…” info.

I wanted to find out if you put this snippet specifically under the Googlebot section. If I recall correctly, Shopify’s support documentation states that Google will ignore any robots.txt data not specifically assigned to the Googlebot.

Let me know if you have any additional information that would add value to this discussion.

@jamnas @lariushub @Tenpinshop Please let us know if you have come up with a solution that works as well. Thank you!

Trestlesouth · January 12, 2025, 11:59am

Hi Marine 6

Our store is also affected and can not find a solution. Do you know if it is an option to build a new theme and migrate, or do these problems persist and what do you know further to fix if we choose not to rebuild ?

Regards

Trestlesouth · January 12, 2025, 12:04pm

Hi Mirko We have been facing the same issues but cannot seem to get a solution. I have been advised by a New Agency to rebuild a New Shopify Store. and migrating over, is this a suggestion worth doing. Or do you have any Notes on how I could Fix the issues on my Shopify Store. Or Who I could get assistance from. I am in South Africa and havent as yet found help fixing. Regards Caron

theDMM · January 13, 2025, 10:11pm

Hi @Trestlesouth just wanted to chime in and let you know that creating a new Shopify store will most likely not fix the issue. This issue seems to be a native Shopify issue - what I mean by this being a native Shopify issue is that no matter which theme (code) you use to build your new site, it’s still using Shopify’s website builder with native backend code. The appended “pr_prod_strat=” in URLs is likely from Shopify’s built-in product recommendation mechanism and you cannot simply remove this mechanism. Instead, what I’d recommend is specifically targeting the Googlebot in the robots.txt file and telling it to ignore these types of URLs (more context in my comments above). I look forward to hearing what works for you.

Trestlesouth · January 14, 2025, 4:47am

Thank you Connor. A robots.txt block was done, however noticed now more pages being indexed.

I had 179 pages indexed now have hundreds.

I only have 150 pages on my store?

Topic		Replies	Views
Weird url adding in google search console Start a Business	35	175	December 27, 2024
Wie kommt diese URL zustande? (Fallback) Technische Fragen & Antworten shopify-theme	8	3	April 28, 2023
How can I prevent the new indexing bug from creating useless pages on Google? Technical Q&A	385	691	July 6, 2023
Unidentified string on the end of my product urls Shopify Discussion troubleshooting	1	26	October 19, 2023
Search Console - kanonischer Tag - URLS Endung &pr_seq=uniform Technische Fragen & Antworten seo	1	7	April 8, 2024

How can I stop unwanted URLs from being indexed in Google?

Related topics