How can I stop unwanted URLs from being indexed in Google?

Topic summary

A Shopify store owner discovered nearly 80,000 unwanted URLs with tracking parameters (e.g., ?pr_prod_strat=collection_fallback&pr_seq=uniform) appearing in Google Search Console, potentially harming SEO rankings by consuming crawl budget without being indexed.

Root Cause:
These URLs originate from Shopify’s native product recommendation system, which appends tracking parameters to collect visitor behavior data for improved suggestions.

Proposed Solutions Discussed:

  • Modifying theme code (e.g., card-product.liquid) to strip parameters from crawler-visible links while preserving them for user clicks via JavaScript
  • Adding Disallow rules to robots.txt targeting these parameter patterns
  • Using noindex, nofollow meta tags (deemed ineffective since parameters don’t appear in handle or canonical_url variables)

Community Feedback:
Multiple users report that standard fixes (robots.txt blocks, code modifications) fail to prevent Google from crawling these URLs. One user claims to have successfully de-indexed 200K+ pages but hasn’t shared the specific method publicly.

Current Status:
No definitive solution confirmed. The issue appears to be a persistent Shopify platform limitation affecting multiple stores. Rebuilding stores won’t resolve it since the tracking mechanism is built into Shopify’s core system. Some users note Google primarily penalizes crawl budget issues on sites with 1M+ pages, suggesting smaller stores may experience limited impact.

Summarized with AI on November 2. AI used: claude-sonnet-4-5-20250929.

Hello,

I’m here to give feedback on this topic and the changes made.

I do not understand how Shopify can still leave in 2024 such problems when we know that it wants to be the leaders in Ecommerce and that today Google is increasingly attentive to these exploration budgets.

I made the modifications around April 10, 2024.

The modifications include:

The change of “{{ card_product.url }}” To “{{ card_product.url | split: “?” | first }}”

The modification of Robots.txt :
Disallow: ?pr_prod_strat=pr_seq=uniform
Disallow: /
?q=

Disallow: *.atom
Disallow: ?variant=

I’ve given google time to take these changes into account.

Today, I can report that these damn urls are still crawled by Google.

@tim_1 , would it be possible to get in touch? I’d like to call on your services to try and find a solution for my site.

Thanks for your help