New Shopify store for POD (Print-on-Demand) products with 22 collections, 369 main products and 4,160 variants …
Result by Google Search console: Indexed: 401 … not indexed: 3.62K = 90%
9 main reasons named by Google:
- 1 page : duplicate with out user-selected canonoical
- 1 page : excluded by “noindex” tag
- 1 page : blocked due to other 4xx issue
- 14 pages: discovered, not indexed
- 25 pages: blocked by robots.txt (see following 3.1 and 3.2)
- 37 pages: re-direct (Rec.: We use a shorten URL app to “re-direct” to very long URLs being used in advertising banners and on social media. Thats fine for us.)
- 109 pages: 404 not found (resolved as we found a broken URL by forwarding a Shorten URL to a deleted product page)
- 567 pages: Crawled - currently not indexed … Google System
- 2,880 pages: Alternate page with proper canonical tag (see following position 2.)
We have studied all possible materials to get an understanding by our own, e.g.
-
Basics about Google search indexing here in Shopify Community and external links like this. Yet, we need some guidance for following points 2 and 3.
-
Lets start with the biggest number: about “Alternate page with proper canonical tag” we found here a hint. - Can we assume, that the upper named 2,880 pages are not an issue ??? The high number is worrying, but so far as we understand, “variants” (e.g. different colours, hardware materials or sizes within one printable product refer to the main product page (in total we have 369) so therefore it won’t be indexed, right ?
-
When indexing, we have used the “original robot.txt” file (under robots.txt.liquid in the theme code editor) and also offered Google Search console the different XML Sitemap pages (overview here). The number is small, but these pages are highly relevant for a trust-building process with our new customers.
Yet, we have noticed, that very relevant pages are blocked, which has not been initiated by us within the robots.txt , e.g. “policies” pages (shipping, refund, terms of services, data privacy etc. …) are highly important for the users. - E.g. a potential buyer who sees our advertising on Instagram might first google for our brand name and policies. If not popping up, this user might stop to be further interested.
In Robots.txt we only have added following exclusion:
User-agent: *
Disallow: /checkout
Disallow: /checkout/thank_you
Disallow: /account
Disallow: /account/login
Disallow: /account/register
Disallow: /cart
Disallow: /admin
Yet, if you look at the output, it says very strange things to any search crawler. Here the robots.txt file.
3.2 Similar with with our collection pages being blocked by robots.txt . But it should not be, see 3.1.
We use often the term “new collection” for the info text in our promos on different social media (e.g. x/Twitter, Instagram, interests, reddit/reddit community). If a user might google with the brand name for collections, same here nothing would pop up.
4.) 567 pages: Crawled - currently not indexed … Google System. What does this mean ??? No idea about at all, see screenshot.
We’d like to avoid any “self damaging”, naturally. As we need traffic and conversions. - For any hint and advice we thank in advance.

