How can I effectively webscrape data from my own e-commerce site?

Does anyone have an easy way to webscrape data from their own Shopify Site?

A developer reached out and we’re trying to see how to pass pricing information to their price comparison site. They say Shopify blocks their webcrawler, despite crawling our site at a slow speed.

They mentioned if they can provide an IP address to whitelist or a user agent string. I’m not sure how I would use this information, but if anyone can provide any technical knowledge, I can pass this over to the developer.

Hi there, get them to look at this: https://www.youtube.com/watch?v=jPjxWC7zV2s&t=795s

It’s an amazing way to use a little bit of Python to scrape whatever you need from any Shopify site and spit out the results into a csv.

I knew very little Python and managed to use this to get 5000+ products.

Good luck

If it’s your site and you are looking to get product info by far a better solution is to create a custom app and use the admin API (https://shopify.dev/docs/api/admin-graphql).

Scrapers will always be brittle solutions, the API is the officially supported way to get your product info. You would only need a scraper if it was someone else’s site.

Edit: i re-read and it sounds like you are trying to get your info listed on another website where they are doing the scraping. In this case they should really have an app already that you could just install or have a solution for a custom app.

Hi there we are working on a solution for this as we often have clients who stock the same products as another shopify site AND the owner of the site has given them permission to use the information. Basically because they got it word from word from the main supplier anyway. If you can provide the URL of your store I can send you a test file with 100 or so products information in it. If that then is of interest for you we can then get the rest of your products for you.

Whenever I needed to pull data from my own Shopify site, I first tried using their API since it’s pretty straightforward and gives you clean access to product data, pricing, and all that. But I’ve hit roadblocks when working with third-party devs who only know scraping, and sometimes Shopify throws blocks even if you’re the site owner. Having a specific IP or user agent to whitelist sometimes works, if you can access your store hosting or firewall settings, you might be able to allow their crawler through. I found talking to Shopify support directly can help too, especially when you explain the situation.

Once I tried scraping with a dev friend, even tweaking headers and crawl delays, but Shopify would occasionally lock us out until we figured out the IP whitelisting. Later on, I learned about these crawling services that make the process easier by handling things like captchas and rotating proxies for you. If your developer keeps getting blocked, you could check out https://crawlbase.com/ where they offer web data extraction tools built just for these kinda problems.

Just use this, you can export products from any shopify site: https://shopifywebscraper.com/