How can I stop unwanted URLs from being indexed in Google?

connore
Excursionist
24 0 11

Hello, 
I noticed many webpages starting with "pr_prod_strat=collection_fallback" and ending with "&pr_seq=uniform" are appearing on my google search consol to be indexed. 

 

Here is an example string that is at the end of my product URLs:

?pr_prod_strat=use_description&pr_rec_id=f50a47009&pr_rec_pid=4703513280571&pr_ref_pid=4703091556411&pr_seq=uniform

 

This created nearly 80,000 new pages being discovered by Google, likely overwhelmed Google's indexing system - Though these pages aren't indexed (shown in the screenshot below), the sudden increase in 'worthless' URLs seems to have negatively impacted our site's rankings.  Google dislikes the effort of discovering pages it won't ultimately index.

connore_2-1710519340167.png

 

1) Can someone explain where these are coming from? And how to prevent them?

2) Will my method below work to properly noindex/nofollow pages like these?

3) Will adding the "<meta name="robots" content="noindex, nofollow">" code in the head of any webpage prevent it from being indexed and followed?


I was thinking of adding the codes below (placement shown in the screenshots), will this solve the issue?:

{% if handle contains 'seq=uniform' %}
<meta name="robots" content="noindex, nofollow">
{% endif %}

{%- if canonical_url="*seq=uniform" -%}
<meta name="robots" content="noindex, nofollow">
{%- endif -%}

 

And adding the code below to my robots.txt file:
{%- if group.user_agent.value == '*' -%}
{{ 'Disallow: /products/*?pr_prod_strat=*' }}
{%- endif -%}

 

connore_0-1710518617462.png

 

connore_1-1710518617461.png

 

 

Replies 7 (7)

tim
Shopify Expert
3459 279 1284

These are URLs from Recomended/Complimentary products.

Using these links allows Shopify to collect some data on your visitors behaviour and make better recommendations.

 

It's possible to modify your theme to not output these links, say look in your card-product.liquid snippet (or similar) and replace 

{{ card_product.url }}

with 

{{ card_product.url | split: "?" | first }}

This is the easiest, but you're loosing "intelligence" in product recommendations.

 

The following should not work, because

a) neither handle nor canonical_url include any of these parameters

b) SE will not like having and URL and being unable to index it -- ie those URL will still be in the list of your URL and consume your crawl budget.

 

{% if handle contains 'seq=uniform' %}
<meta name="robots" content="noindex, nofollow">
{% endif %}

{%- if canonical_url="*seq=uniform" -%}
<meta name="robots" content="noindex, nofollow">
{%- endif -%}

 

 

A bit better option is to modify robots.txt, however b) from above will still apply and. Have not tried this myself recently.

 

I thought the best option is to add rel="nofollow" to these links (and then also disallow in robots), but then I've found out that rel=nofollow is now considered  only a hint rather then command by Google now.

 

Relevant info here: https://www.sammyseo.com/death-of-internal-nofollow-use-js-to-stop-google-crawling/ 

 

I guess it's actually possible to implement the JS "onclick" thing, but it's theme-dependant.

Like instead of  

<a href={{ product.url }}> ...</a>

output something like

<a 
 {% if product.url.contains "?" %}
   href="{{ product.url | split: '?' | first }}
   data-tail= "{{ product.url | split: '?' | last }}
   onclick="window.location=this.href + '?' + this.dataset.tail; return false"
 {% else %}
   href="{{ product.url}}"
 {% endif %}
> ... </a>

 

If my post is helpful, consider liking it -- it will help others with similar problem to find a solution.
I can be reached via e-mail tairli@yahoo.com
connore
Excursionist
24 0 11

@tim  Hi Tim, thanks for your detailed response. Based on the above information you provided, what would you recommend is the best way to handle this situation? We would like to keep the recommendations module functioning, but I am afraid Google finding but being unable to index the pages may affect our Google rankings as well.

tim
Shopify Expert
3459 279 1284

I'd say:

1. Implement disallow in robots.txt;

2. Implement card-product modification based on the code provided above;

 

Wait for Google to expire the URLs they've already scavenged

If my post is helpful, consider liking it -- it will help others with similar problem to find a solution.
I can be reached via e-mail tairli@yahoo.com
connore
Excursionist
24 0 11

@tim Thanks for your response. By adding the code above, (1) will the product recommendation module lose its' learnings and ability to recommend products to users? And (2) will this prevent UTMs from being tracked, or is that not the case since UTMs are external from these URLs being generated? 

In the case that the product module loses its' learnings, are there any other methods to help with this situation?

 

Thanks again for your help Tim! 

tim
Shopify Expert
3459 279 1284

No. The code above splits original suggested link for product page into two parts:

one is canonical URL without any parameters and it goes inside the <a href="..." so crawler bot sees this canonical URL only.

second part, which is those tracking parameters -- is stored in the data-tail attribute.

 

When visitor clicks the link, onclick JS code combines these two parts and makes browser to go to this original ULR complete with recommendations tracking data.

When visitor arrives on that page, Shopify can learn (based on these URL params) which one of the recommended products was most relevant to this visitor.

 

Say, original URL from Shopify recommendation engine is

/products/XXX?pr_prod_strat=use_description&pr_rec_id=f50a47009...

 

instead of the current

<a href="/products/XXX?pr_prod_strat=use_description&pr_rec_id=f50a47009..."

 

the code will output 

<a 
 href="/products/XXX"
 data-tail="pr_prod_strat=use_description&pr_rec_id=f50a47009..."
 onclick="window.location=this.href + '?' + this.dataset.tail; return false"

Crawler bot will see the link to /products/XXX, 

but after clicking the link visitors will arrive where Shopify has intended them to go.

 

If my post is helpful, consider liking it -- it will help others with similar problem to find a solution.
I can be reached via e-mail tairli@yahoo.com
Marine6
Visitor
1 0 0

Hello,

I'm here to give feedback on this topic and the changes made.

I do not understand how Shopify can still leave in 2024 such problems when we know that it wants to be the leaders in Ecommerce and that today Google is increasingly attentive to these exploration budgets.

I made the modifications around April 10, 2024.

The modifications include:

The change of "{{ card_product.url }}" To "{{ card_product.url | split: "?" | first }}"

The modification of Robots.txt :
Disallow: *?pr_prod_strat=*pr_seq=uniform
Disallow: /*?q=*'
Disallow: *.atom
Disallow: *?variant=*

I've given google time to take these changes into account.

Today, I can report that these damn urls are still crawled by Google.

@tim , would it be possible to get in touch? I'd like to call on your services to try and find a solution for my site.

 

Thanks for your help

tim
Shopify Expert
3459 279 1284

Yep, sure you can direct mail me via forum or e-mail at tairli@yahoo.com 

 

If my post is helpful, consider liking it -- it will help others with similar problem to find a solution.
I can be reached via e-mail tairli@yahoo.com