Home/Blog/Shopify Store Scraping
E-Commerce

Shopify Store Scraping & Product Data

Most Shopify stores expose their entire catalog through a single public JSON endpoint. Here's how to use it responsibly, what to do when it's disabled, and what the data actually contains.

10 min read·products.json, sitemap fallback, mobile proxies·Last updated: April 2026

1. The /products.json Endpoint

Every Shopify store ships with a public JSON endpoint at /products.json that returns the full catalog in structured form — up to 250 products per page with pagination. It was originally designed to power app integrations, but it's publicly accessible on nearly every store.

This is a legal gray area in the same sense as any other publicly-served JSON — respect each store's Terms of Service, don't overwhelm their origin, and use the data only for lawful purposes (research, price comparison, personal analytics).

Note: Store owners can disable this endpoint via the theme or an app. If it returns 404 or an empty products array, fall back to the sitemap approach in section 3.

2. Python: Paginating the Catalog

The JSON endpoint supports limit (max 250) and page parameters. Paginate until the products array comes back empty:

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15",
    "Accept": "application/json",
}
proxies = {
    "http": "http://USER:PASS@hostname:http_port",
    "https": "http://USER:PASS@hostname:http_port",
}

def scrape_shopify_products(store_url):
    all_products = []
    page = 1
    while True:
        url = f"{store_url}/products.json?limit=250&page={page}"
        r = requests.get(url, headers=headers, proxies=proxies, timeout=30)
        if r.status_code != 200:
            break
        data = r.json()
        products = data.get("products", [])
        if not products:
            break
        all_products.extend(products)
        page += 1
    return all_products

# usage
catalog = scrape_shopify_products("https://examplestore.com")
print(f"fetched {len(catalog)} products")

Routing through a mobile proxy keeps you out of per-IP rate limits on large catalogs and prevents accidental flags if you're crawling many stores from the same origin.

3. Fallback: sitemap.xml & Storefront

When /products.jsonis disabled, Shopify still publishes a sitemap that indexes every product URL. Start at /sitemap.xml and follow the sitemap_products_*.xml children:

import requests
import xml.etree.ElementTree as ET

NS = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}

def product_urls_from_sitemap(store_url):
    urls = []
    index = requests.get(f"{store_url}/sitemap.xml", proxies=proxies, timeout=30).text
    root = ET.fromstring(index)
    for loc in root.findall(".//sm:loc", NS):
        if "sitemap_products" in loc.text:
            sub = requests.get(loc.text, proxies=proxies, timeout=30).text
            subroot = ET.fromstring(sub)
            for u in subroot.findall(".//sm:loc", NS):
                urls.append(u.text)
    return urls

From each product URL, parse the storefront HTML. Shopify usually embeds a full product object in a <script type="application/json"> tag with id="ProductJson-*" — same fields as /products.json.

4. What's Inside the JSON

Each product object in /products.json is rich. The fields you'll actually use:

id, handle, titleStable identifiers; handle is the URL slug
vendor, product_typeBrand and category tag
tagsArray of free-form tags — great for clustering
variants[]SKU, price, compare_at_price, weight, barcode, available
options[]Size/color/style definitions
images[]Full-resolution Shopify CDN URLs
body_htmlRich-text product description
created_at, updated_at, published_atISO timestamps — use updated_at to detect inventory/price changes

One field that's not exposed: exact inventory count. You get available: true/false per variant only.

5. Practical Use Cases

  • Competitor research. Snapshot a rival's catalog, monitor pricing and new SKU drops via updated_at.
  • Dropshipping product sourcing. Cross-reference popular Shopify products with supplier catalogs.
  • Market trend analysis. Track tag frequency across hundreds of stores to spot rising categories.
  • Brand monitoring. Detect unauthorized resellers by name-matching your vendor field on other stores.
  • Feed ingestion. Power product comparison sites without per-merchant API integrations.

Related Guides

Crawl Shopify Without Rate Limits

Rotate mobile IPs with one API call, never hit per-origin throttling. Test it for $5.