We're using SEMrush for some SEO work with Shopify accounts and we're getting a lot of 429 errors, which I assume is Shopify throttling access to the robot.
SEMrush allow you to override the robot settings, but this requires access to robots.txt, which Shopify doesn't allow.
Is there any other way to throttle robot access to avoid these errors?
Hey Bo - that tool should (or at least the last time I looked at it) allow you to set a rate limit from with the app. That won't need access to the robots file.
Hey Jason, that's what I expected as well, but the crawl delay settings only seem to support respecting robots.txt (see attached).
I'll reach out to SEMrush as well to see if a different setting is buried somewhere. Google wasn't much help.
While I wait for a response from SEMrush, it's probably good feedback anyway for Shopify to explicitly set the crawl delay they expect in their robots.txt.
Since we can't configure it oursleves, it would help if Shopify set a rate they considered reasonable to reduce these errors from legitimate audit tools.
An update: It seems like this isn't possible from either end. This is the response I received from SEMrush:
Apologies for the inconvenience. We do understand that Shopify users are not able to edit the contents of the robots.txt file and we hope to come up with a solution to this problem in the future. I wish I could help you more.
Would be great if Shopify limited their crawl rate in their own robots.txt. Without that setting, we have zero control over whether Shopify blocks a legitimate tool or not, since we often can't set the crawl limit at the robot's side.
I am having this same problem with SEMrush, 7,823 new '429' errors in my latest report I just received today! I need to override SEMrushBOT's default behavior by telling it to follow the "crawl-delay" directive in the robots.txt file. But I can't access the robots.txt file. What a conundrum.
Laura, I would recommend putting pressure on SEMrush to manually set a crawl delay. I have asked them to do so, but more voices will increase the chances of it making its way into an update.
As Shopify has access to the robot, the Council thought the bugs will analyze your site, I think the fault lies from where you have not properly configured
I'm new to Shopify, not to SEO. These 429 errors are a bummer; renders the SEMrush site audit tool virtually useless. There's another post on Shopify about this same issue (Why is every page of my store giving 429 errors in semrush.com?). On that post, a Shopify employee said "If you go into your SEMrush.com settings, you can manually set a delay for the bot in SEMrush. When you do that it reduces the amount of 429 errors that will appear." As far as I can tell, this feature doesn't exist in SEMrush. That employee also linked to this API Limit article, which has me concerned that the only way to fix this issue is to upgrade to a Shopify Plus account. Has anyone found a solution to this yet? Thanks guys!
As far as I can tell, this feature doesn't exist in SEMrush.
I'm sure it used to though it's been a long time since I've used it.
It also has nothing to do with Shopify Plus - this is purely about the rate of bots requesting resources so is a core platform level topic.
The solution right now would be having SEMrush let you control the crawl rate via the tool directly. There's no option in Shopify to let you whitelist a service like SEMrush or let you edit the robots.txt file. Give those guys a shout and see if this is on their roadmap.
Sorry guys, here's the response I received from a SEMrush product manager responsible for their Site Audit tool. Forgot to post it back in March...
Shopify: "If you go into your SEMrush.com settings, you can manually set a delay for the bot in SEMrush. When you do that it reduces the amount of 429 errors that will appear."
SEMrush: "Unfortunately, Shopify employee was probably misguided by another option. We allow ignoring or following the robots.txt crawl delay, but we don't allow to customize it and we never did have a way to do it. We contacted Shopify and asked for a change in their robots.txt file, but they refused to do it."
So it appears that Jason's suggestion of getting SEMrush to add a custom crawl delay feature to their audit tool is the only solution.
This is super annoying that as a Shopify customer we can't have a simple crawl delay to the robots.txt file. I would love to hear a sound technical reason as to why they can't accommodate this request. And impact to server resources isn't logical in my mind, as the crawlers going to be hitting the server either way.
Yet another reason why we won't be recommending the Shopify platform to our SEO clients. Far too many stupid restrictions like this.
Great news! I've discovered not one but TWO solutions to this problem...
1) SEMrush recently updated their Site Audit tool with a new setting that allows you to limit the crawl speed to 1 URL every 2 seconds. The Shopify SEO community has been begging for this for years, so I suspect they added it specifically for us.
I tested this out this morning (March 5th) on one of my client's Shopify sites (~300 pages total) and it appears to have worked! As you can see from the crawl progress chart below, there used to be hundreds of "issues" that vanished with the most recent crawl.
2) If the new SEMrush feature doesn't do the trick, I recommend checking out Sitebulb. It's a fantastic, albeit slightly more advanced, alternative to the SEMrush Site Audit tool. It costs $35/mo, but as someone who's used Screaming Frog, SEMrush, and Ahrefs to crawl thousands of sites, it's worth every penny.
Here's what the folks at Sitebulb have to say about crawling Shopify sites and their solution (source)...
"If you've tried crawling a Shopify site with over 500 internal URLs, you may have noticed that those rat **bleep** eventually stonewall you with 430 HTTP errors (TOO MANY REQUESTS). It only seems to kick in around the 500 URL mark, so if you have a 1000 page site, just delete half your content and you're sorted... Alternatively, you can crawl at 1 URL/second, which is slow enough to stop Shopify blocking you. In the pre-audit, Sitebulb will detect if it's a Shopify site, and give you a cheeky little message to tell you to slow down, lest you blow your load early."
Hope this helps! Happy crawling 🙂
As of today, June 21st, 2021, we have launched the ability to edit the robot.txt file to give merchants more control over the information that is crawled by search engines. You can learn more about how to edit your robot.txt file through our community post here.
Due to the age of the topic, I will be locking this thread. If you have any questions about the new feature, please do not hesitate to create a new post under our "Techincal QA" board.