I was just checking a clients website and noticed what I would class as an issue with the robots.txt file.
The client has a fully https site. I think that's the norm now.
However, a few files still work without the https, and do not redirect. And those are the robots.txt file and the sitemap files.
This means there is a crawl path on the http site from the robots.txt file to sitemaps that reference all the pages as http. As those pages redirect, those sitemaps are invalid.
Some crawlers test robots.txt files for sitemaps without analysing if the rest of the site is forced to https. This will lead them down a merry trail of crawing and redirecting. A waste of their time and Shopifies server resources.
Maybe others have submitted http based sitemaps and are confused that they get lots of warnings.
I think the http version of the robots.txt file needs re-thinking. At least drop the sitemap reference from it.
And the http sitemaps should redirect to the correct ones on https.
The same is happening on my client's site. Switched to https and even though the robots.txt states the correct sitemap, issuing a curl instruction and you can see the old sitemaps with http versions of products that do not even exist anymore. This is creating a load of issues.
The second piece of info I may add to the equation is that the client migrated the site from one Shopify install to another and the old one is blocked from access to people but their robots.txt and sitemap.xml return a 404 instead of a disallow all bots.
Anyway, by reading your issue, and analyzing the answer from Shopify (being able to 301 http to https sitemap) as a new "feature", I gather the same solution would actually solve both our problems and anybody else's who is experiencing this.
Please, if you found a solution, please share away!!
Correct, the robots.txt and sitemap files do not 301 based on Shopify settings and there's no way to affect this. You won't get any issues on Google's front if you setup 1) the preferred http or https version in Google Search Console 2) submit the sitemap within your preferred version, 3) configure the non-www or www versions based on your store's setup in GSC. Refer to https://www.digitaldarts.com.au/the-expert-guide-to-shopify-seo
Thank you, Joshua. Google may not report those issues (although we see them on the 404s it's creating) But our bots are able to access the old sitemaps because of the lack of 301. In spite of everything being correctly setup the moment the client switched to SSL. Issue a Curl statement and you'll see.
This implies that any URL that has ever been able to be accessed from anywhere (and this site was live without https and ranking on search engines before) is able to be accessed by GoogleBot. Here is the Google statement:
I am pushing for Shopify to implement this redirect at the Nginx level (sent them the Nginx instructions that I hope they won't need) and last week received an email that they were in the process of implementing this... wooohooo... finally... they are checking on possible impacts before its rollout...
I just cannot believe there are so many SEO pieces missing from the Shopify equation or just plain wrong and play against you. Do they have an SEO team in-house? (If not, they badly need one - technical SEO I mean)