We're using SEMrush for some SEO work with Shopify accounts and we're getting a lot of 429 errors, which I assume is Shopify throttling access to the robot.
SEMrush allow you to override the robot settings, but this requires access to robots.txt, which Shopify doesn't allow.
Is there any other way to throttle robot access to avoid these errors?
As far as I can tell, this feature doesn't exist in SEMrush.
I'm sure it used to though it's been a long time since I've used it.
It also has nothing to do with Shopify Plus - this is purely about the rate of bots requesting resources so is a core platform level topic.
The solution right now would be having SEMrush let you control the crawl rate via the tool directly. There's no option in Shopify to let you whitelist a service like SEMrush or let you edit the robots.txt file. Give those guys a shout and see if this is on their roadmap.
Sorry guys, here's the response I received from a SEMrush product manager responsible for their Site Audit tool. Forgot to post it back in March...
Shopify: "If you go into your SEMrush.com settings, you can manually set a delay for the bot in SEMrush. When you do that it reduces the amount of 429 errors that will appear."
SEMrush: "Unfortunately, Shopify employee was probably misguided by another option. We allow ignoring or following the robots.txt crawl delay, but we don't allow to customize it and we never did have a way to do it. We contacted Shopify and asked for a change in their robots.txt file, but they refused to do it."
So it appears that Jason's suggestion of getting SEMrush to add a custom crawl delay feature to their audit tool is the only solution.
This is super annoying that as a Shopify customer we can't have a simple crawl delay to the robots.txt file. I would love to hear a sound technical reason as to why they can't accommodate this request. And impact to server resources isn't logical in my mind, as the crawlers going to be hitting the server either way.
Yet another reason why we won't be recommending the Shopify platform to our SEO clients. Far too many stupid restrictions like this.
Great news! I've discovered not one but TWO solutions to this problem...
1) SEMrush recently updated their Site Audit tool with a new setting that allows you to limit the crawl speed to 1 URL every 2 seconds. The Shopify SEO community has been begging for this for years, so I suspect they added it specifically for us.
I tested this out this morning (March 5th) on one of my client's Shopify sites (~300 pages total) and it appears to have worked! As you can see from the crawl progress chart below, there used to be hundreds of "issues" that vanished with the most recent crawl.
2) If the new SEMrush feature doesn't do the trick, I recommend checking out Sitebulb. It's a fantastic, albeit slightly more advanced, alternative to the SEMrush Site Audit tool. It costs $35/mo, but as someone who's used Screaming Frog, SEMrush, and Ahrefs to crawl thousands of sites, it's worth every penny.
Here's what the folks at Sitebulb have to say about crawling Shopify sites and their solution (source)...
"If you've tried crawling a Shopify site with over 500 internal URLs, you may have noticed that those rat **bleep** eventually stonewall you with 430 HTTP errors (TOO MANY REQUESTS). It only seems to kick in around the 500 URL mark, so if you have a 1000 page site, just delete half your content and you're sorted... Alternatively, you can crawl at 1 URL/second, which is slow enough to stop Shopify blocking you. In the pre-audit, Sitebulb will detect if it's a Shopify site, and give you a cheeky little message to tell you to slow down, lest you blow your load early."
Hope this helps! Happy crawling :)
As of today, June 21st, 2021, we have launched the ability to edit the robot.txt file to give merchants more control over the information that is crawled by search engines. You can learn more about how to edit your robot.txt file through our community post here.
Due to the age of the topic, I will be locking this thread. If you have any questions about the new feature, please do not hesitate to create a new post under our "Techincal QA" board.