429 Errors and Crawl Delay

Garrett_M
New Member
4 0 0

This really is a nightmare for SEMRush or any crawler that isn't google.  I can't believe there isn't a solution for this yet...

0 Likes
Jason
Shopify Expert
10416 165 2064

As far as I can tell, this feature doesn't exist in SEMrush. 

I'm sure it used to though it's been a long time since I've used it.

It also has nothing to do with Shopify Plus - this is purely about the rate of bots requesting resources so is a core platform level topic.

The solution right now would be having SEMrush let you control the crawl rate via the tool directly. There's no option in Shopify to let you whitelist a service like SEMrush or let you edit the robots.txt file. Give those guys a shout and see if this is on their roadmap.

I jump on these forums to help and share some insights. Not looking to be hired, and not looking for work.

Don't hand out staff invites or give admin password to forum members unless absolutely needed. In most cases the help you need can be handled without that.


★ http://freakdesign.com.au ★
0 Likes

Sorry guys, here's the response I received from a SEMrush product manager responsible for their Site Audit tool. Forgot to post it back in March... 

Shopify: "If you go into your SEMrush.com settings, you can manually set a delay for the bot in SEMrush. When you do that it reduces the amount of 429 errors that will appear."

SEMrush: "Unfortunately, Shopify employee was probably misguided by another option. We allow ignoring or following the robots.txt crawl delay, but we don't allow to customize it and we never did have a way to do it. We contacted Shopify and asked for a change in their robots.txt file, but they refused to do it." 

So it appears that Jason's suggestion of getting SEMrush to add a custom crawl delay feature to their audit tool is the only solution. 

Kevin Wallner // SEO & Analytics Expert
- Was my reply helpful? Click Like to let me know!
- Was your question answered? Click Accept as Solution
- Ready to accelerate your SEO growth? Let's chat!
0 Likes
JeffFidler2019
Shopify Partner
1 0 0

This is super annoying that as a Shopify customer we can't have a simple crawl delay to the robots.txt file.  I would love to hear a sound technical reason as to why they can't accommodate this request.  And impact to server resources isn't logical in my mind, as the crawlers going to be hitting the server either way.

 

Yet another reason why we won't be recommending the Shopify platform to our SEO clients.  Far too many stupid restrictions like this.

0 Likes

Great news! I've discovered not one but TWO solutions to this problem...

 

1) SEMrush recently updated their Site Audit tool with a new setting that allows you to limit the crawl speed to 1 URL every 2 seconds. The Shopify SEO community has been begging for this for years, so I suspect they added it specifically for us. 

 

semrush site audit crawler settings.jpg

 

I tested this out this morning (March 5th) on one of my client's Shopify sites (~300 pages total) and it appears to have worked! As you can see from the crawl progress chart below, there used to be hundreds of "issues" that vanished with the most recent crawl. 

 

semrush site audit progress history.jpg

 

2) If the new SEMrush feature doesn't do the trick, I recommend checking out Sitebulb. It's a fantastic, albeit slightly more advanced, alternative to the SEMrush Site Audit tool. It costs $35/mo, but as someone who's used Screaming Frog, SEMrush, and Ahrefs to crawl thousands of sites, it's worth every penny. 

 

Here's what the folks at Sitebulb have to say about crawling Shopify sites and their solution (source)...

 

"If you've tried crawling a Shopify site with over 500 internal URLs, you may have noticed that those rat **bleep** eventually stonewall you with 430 HTTP errors (TOO MANY REQUESTS). It only seems to kick in around the 500 URL mark, so if you have a 1000 page site, just delete half your content and you're sorted... Alternatively, you can crawl at 1 URL/second, which is slow enough to stop Shopify blocking you. In the pre-audit, Sitebulb will detect if it's a Shopify site, and give you a cheeky little message to tell you to slow down, lest you blow your load early."

 

Hope this helps! Happy crawling :)  

Kevin Wallner // SEO & Analytics Expert
- Was my reply helpful? Click Like to let me know!
- Was your question answered? Click Accept as Solution
- Ready to accelerate your SEO growth? Let's chat!
0 Likes
Shogen
New Member
2 0 0
Trevor
Community Moderator
Community Moderator
2930 380 589

Hello!

As of today, June 21st, 2021, we have launched the ability to edit the robot.txt file to give merchants more control over the information that is crawled by search engines. You can learn more about how to edit your robot.txt file through our community post here

Due to the age of the topic, I will be locking this thread. If you have any questions about the new feature, please do not hesitate to create a new post under our "Techincal QA" board.


Trevor | Community Moderator @ Shopify 
 - Was my reply helpful? Click Like to let me know! 
 - Was your question answered? Mark it as an Accepted Solution
 - To learn more visit the Shopify Help Center or the Shopify Blog