429 Errors and Crawl Delay

New Member
4 0 0

This really is a nightmare for SEMRush or any crawler that isn't google.  I can't believe there isn't a solution for this yet...

0 Likes
Shopify Expert
9974 84 1491

As far as I can tell, this feature doesn't exist in SEMrush. 

I'm sure it used to though it's been a long time since I've used it.

It also has nothing to do with Shopify Plus - this is purely about the rate of bots requesting resources so is a core platform level topic.

The solution right now would be having SEMrush let you control the crawl rate via the tool directly. There's no option in Shopify to let you whitelist a service like SEMrush or let you edit the robots.txt file. Give those guys a shout and see if this is on their roadmap.

★ Winning Partner of the Build a Business competition. ★ http://freakdesign.com.au
0 Likes

Sorry guys, here's the response I received from a SEMrush product manager responsible for their Site Audit tool. Forgot to post it back in March... 

Shopify: "If you go into your SEMrush.com settings, you can manually set a delay for the bot in SEMrush. When you do that it reduces the amount of 429 errors that will appear."

SEMrush: "Unfortunately, Shopify employee was probably misguided by another option. We allow ignoring or following the robots.txt crawl delay, but we don't allow to customize it and we never did have a way to do it. We contacted Shopify and asked for a change in their robots.txt file, but they refused to do it." 

So it appears that Jason's suggestion of getting SEMrush to add a custom crawl delay feature to their audit tool is the only solution. 

0 Likes
Shopify Partner
1 0 0

This is super annoying that as a Shopify customer we can't have a simple crawl delay to the robots.txt file.  I would love to hear a sound technical reason as to why they can't accommodate this request.  And impact to server resources isn't logical in my mind, as the crawlers going to be hitting the server either way.

 

Yet another reason why we won't be recommending the Shopify platform to our SEO clients.  Far too many stupid restrictions like this.

0 Likes
Highlighted

Great news! I've discovered not one but TWO solutions to this problem...

 

1) SEMrush recently updated their Site Audit tool with a new setting that allows you to limit the crawl speed to 1 URL every 2 seconds. The Shopify SEO community has been begging for this for years, so I suspect they added it specifically for us. 

 

semrush site audit crawler settings.jpg

 

I tested this out this morning (March 5th) on one of my client's Shopify sites (~300 pages total) and it appears to have worked! As you can see from the crawl progress chart below, there used to be hundreds of "issues" that vanished with the most recent crawl. 

 

semrush site audit progress history.jpg

 

2) If the new SEMrush feature doesn't do the trick, I recommend checking out Sitebulb. It's a fantastic, albeit slightly more advanced, alternative to the SEMrush Site Audit tool. It costs $35/mo, but as someone who's used Screaming Frog, SEMrush, and Ahrefs to crawl thousands of sites, it's worth every penny. 

 

Here's what the folks at Sitebulb have to say about crawling Shopify sites and their solution (source)...

 

"If you've tried crawling a Shopify site with over 500 internal URLs, you may have noticed that those rat **bleep** eventually stonewall you with 430 HTTP errors (TOO MANY REQUESTS). It only seems to kick in around the 500 URL mark, so if you have a 1000 page site, just delete half your content and you're sorted... Alternatively, you can crawl at 1 URL/second, which is slow enough to stop Shopify blocking you. In the pre-audit, Sitebulb will detect if it's a Shopify site, and give you a cheeky little message to tell you to slow down, lest you blow your load early."

 

Hope this helps! Happy crawling :)  

0 Likes