starting with ?variant and ?pr_prod
what should I do I have made 40 pages and because of these links there are 247 pages as “Alternative page with proper canonical tag”.
Should I block theme in robots.txt? for example Disallow: /*?pr_prod_strat=
There is most likely nothing you need to do and Google says as much in their documentation.
Google is just saying that they found the variant URL (?variant=) and the recommended product links (?pr_prod_strat=) but see the regular product URL is the canonical (or preferred URL). This is correct and how it should be. They are not indexing those URLs, that’s the point of the Alternative page with proper canonical tag notification.
Remember that Google Search Console is a technical tool used to help diagnose POTENTIAL issues across all pages Google sees. Think of these [non-] issues as a heads-up and most of the time, no action is required by you.
I’m sorry you feel that this is misinformation. The advice shared is directly from Google’s documentation which states
Alternate page with proper canonical tag
This page is marked as an alternate of another page (that is, an AMP page with a desktop canonical, or a mobile version of a desktop canonical, or the desktop version of a mobile canonical). This page correctly points to the canonical page, which is indexed, so there is nothing you need to do. Alternate language pages are not detected by Search Console.
I’ve written a follow-up article that explains how to tell if Google is using the proper canonical URL and ignoring the parameters in the URL
As long as the canonical URL you want Google to use matches the Google-selected canonical, there’s no concern.
Crawling budget is only an issue if you have tens-hundreds of thousands of unique pages where the content changes often (once a week or more). That’s rare for most Shopify stores. If you are that rarity, I recommend you read through Google’s docs on crawl budgets for large sites. In there, you’ll see they recommend keeping your sitemap up to date and making sure your canonicals are set properly, both of which Shopify does for you. You’ll also want to keep 404 errors at bay and clean up long redirect chains.
just because you wrote an article it doesn’t make you right. You give me general rules how google works and I already know. Did you read answer I gave above? I can tell you what here happen:
Shopify use this url for tracking. But this URL’s looks like variant for google, but it is dynamic url for tracking. For website which have 2k products will crawl 100k+ product pages. 100K of useless information.
-Still doesn’t get where is a problem? Because you reach loop of endless variants, google can’t understand your website properly and it looks like low value content. It HUGE problem.
@ilanadavis is 100% on the right track. Not every notice in GSC needs actioning, you need the right SEO context first.
@DariusWS I think you may be over-stating the problem. It’s possible that your site has crawl-budget issues stemming from these recommend URLs, but I would say unlikely. With correct canonicals setup they should be harmless and not causing any real problems.
But if you really want to force a fix regardless, you can remove internal link parameters from the recommend URLs. Like this -
Find product-card-gridor similar name
This will be in a theme file for the recommend block
href="{{ product.url }}"
Replace with
The split filter removes everything after “?” character, removing ALL the URL parameters from the recommend block or product card.
href="{{ product.url | split: "?" | first }}"
Notes:
You won’t be able to see how recommendations are performing in Shopify, as the data will stop being collected.
If you can live with that then could setup alternative recommend product tracking events in GA4 or similar to retain some recommend performance visibility.
Advanced (possible) workaround to broken Shopify recommend tracking:
If Shopify is collecting recommend events server-side. You could try using a JS hack with dispatchEvent() and XMLHttpRequest() to trigger a separate HTTP request to the original parameterised URL on page unload. This is not documented anywhere and I haven’t tested, might not even work if it is
Still there are some pieces of missing knowledge here, reading this (and other similar topics) upon “technical” shopify’s URLs (prod_strat and previously [email removed] with parameters I have to admit that both communities still haven’t developed common knowledge.
proper canonical - CHECKED.
crawled but not indexed in GSC - CHECKED.
wasting crawling budget (valid point from @DariusWS ) - NOT CLEAR (budgets do exist but no one knows what and if at all applied to particular website).
excluding “tech” URLs in robots.txt - NOT CLEAR (shopify says don;t touch robots while their template is pretty loaded with disavowed parameters).
overall impact of “not indexed” in GSC - NOT CLEAR (for months number of these shopify’s spam URLs are still not dropping off).
PS this one example looks bonkers to me as why google need to waste his “checking/validating” resources for alternate canonical pages if we already said to him they’re “duplicates”?
Yeah agree. But if all you want to do is get rid of those URLs and stop discovery deququeing them from all the queues and preventing any indexing, then you can do a change similar to the one suggested. Eventually those page discovery and indexing should drop over a few weeks to months.
A finger in the air test for crawl budget issues is: are all your products and categories are getting crawled, and indexed in a “reasonable” time.
Best way to figure out crawl budget issues is log file analysis. It’s tricky to do web log analysis in Shopify. But not impossible you really need to be at an enterprise level, with CloudFlare O2O then setup some edge log file analysis. Then do the change suggest above, look at the impact it has on Googlebot requests etc. See if they are actually dropping for the dupe pages. See if they are increasing for the “useful” segment of places.
But yeah it’s really tricky to get to the bottom of these kind of things and actual do testing and fixes and measure impacts in a comment.
I think the problem is Shopify have created a feature (product recommends and 2.0 search results) that creates crawl budget waste. Ideally they would change their data collection technique so that recommend click events and similar, are captured via JS events eg on page unload without needing a URL parameter that risks messing up SEO.