Does anybody know/understand why Google keeps on indexing pages that are blocked by robots.txt by default by Shopify (quite rightly so)? In this instance it’s a cart page and an account page.
This error keeps creeping back regularly for me and I can’t figure out why Google decides to override the robots.txt info.
It’s probably links on pages that don’t have nofollow attributes set on them pointing to pages that are irrelevant for a crawler or search engine to access or display.
Lack of attributes on links is sometimes caused by landing page designs or apps that someone slapped some direct links in.
Account, profile, cart etch, this paged not need to fetch the google, for that reason, you can block this page from robots.txt.
The issues you face can happen for various reasons, and here are some:
Misconfiguration in the robots.txt file: The instructions provided in the file may be incorrect or not properly formatted.
Robots.txt file deletion: The robots.txt file may have been accidentally deleted or removed.
Robots.txt file not accessible: The robots.txt file may not be accessible by search engine crawlers. If the server is not configured properly. This could be due to a misconfigured firewall, incorrect file permissions, or other server-side issues.
Third-party apps: Some third-party apps can affect the robots.txt file and cause the pages to be indexed despite being blocked by the file.
Sitemap: Even if a page is blocked by the robots.txt file, it may still be included in the website’s sitemap, which can lead to the page being indexed by search engines.
Links on other pages or dynamic URLs: Despite being blocked by the store’s robots.txt file, some pages may still be crawled and indexed by search engine robots through links on other pages or dynamic URLs.
1-Google Can Index URLs from External Links: If external websites or internal links reference your cart and account pages, Google may index these URLs even if blocked by robots.txt.
2-Blocked Pages Are Indexed, Not Crawled: robots.txt prevents crawling, not indexing. If Google discovers the URL from other sources, it can index the page without crawling its content.
Fix/Guide:
1-Add anoindexMeta Tag:
Edit your Shopify theme to include in the HTML head for cart and account pages. This explicitly tells Google not to index them.
2-Avoid Linking to Blocked Pages:
Ensure cart and account page links aren’t accessible to crawlers (e.g., use JavaScript-based rendering or client-side routing).
3-Use Google Search Console:
Submit the URLs under the Removals tool to request de-indexing.
4-Monitor and Test:
Use the URL Inspection Tool in Google Search Console to confirm the noindex tag is working.
These steps ensure better control over indexing behavior.