Dropshipping Product Research Automation

Finding winning products manually takes hours per niche. Automation with proxies scales this to thousands of products per hour across AliExpress, Amazon Best Sellers, and Shopify stores.

13 min read·AliExpress, Amazon, Shopify, mobile proxies·Last updated: April 2026

1. Sources Worth Scraping

Not all data sources give equal signal. The high-value ones:

Source	Signal	Path
AliExpress Best Sellers	Orders count, price, supplier rating	/w/wholesale-*.html
Amazon Movers & Shakers	24-hour rank change %, category	/gp/movers-and-shakers
Amazon Best Sellers	Daily rank, price, rating count	/Best-Sellers
Shopify stores	Catalog via /products.json, recent drops	/products.json
TikTok trending (TikTok Shop)	Product hashtags, view velocity	Public /shop/discover

2. AliExpress Anti-Bot

AliExpress ships Alibaba Group's in-house bot detection. The tells:

→
Slider captcha ("nc_acb") that triggers quickly on datacenter IPs, especially on search and category pages
→
IP reputation weighting with a strong penalty on Chinese-registered datacenter blocks and any ASN marked as hosting
→
JS-only price rendering on some regions — headless Chrome or Playwright is sometimes needed
→
Region cookies (aep_usuc_f) that control currency and shipping display — set them explicitly or scrape what consumers in the target market actually see

US carrier mobile IPs pass the reputation check easily and, because Alibaba sees tons of legitimate US mobile traffic, rarely get slider-challenged.

3. Python: Tracking Product Velocity

The signal that separates winners from everything else is velocity: how fast orders/reviews/rank are growing. Snapshot daily and diff the orders_count:

import requests, json, datetime, pathlib

headers = {
    "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15",
    "Accept-Language": "en-US,en;q=0.9",
}
proxies = {
    "http": "http://USER:PASS@hostname:http_port",
    "https": "http://USER:PASS@hostname:http_port",
}

STORE = pathlib.Path("velocity.json")

def snapshot(products):
    """products: [{product_id, orders_count, price}]"""
    today = datetime.date.today().isoformat()
    state = json.loads(STORE.read_text()) if STORE.exists() else {}
    for p in products:
        pid = str(p["product_id"])
        state.setdefault(pid, {})[today] = {
            "orders": p["orders_count"],
            "price": p["price"],
        }
    STORE.write_text(json.dumps(state, indent=2))

def velocity(product_id, days=7):
    state = json.loads(STORE.read_text())
    history = state.get(str(product_id), {})
    dates = sorted(history.keys())[-days:]
    if len(dates) < 2:
        return None
    start = history[dates[0]]["orders"]
    end = history[dates[-1]]["orders"]
    return (end - start) / len(dates)  # avg new orders per day

Products with a rising 7-day velocity AND a stable-to-rising price are the candidates worth testing — that combo indicates demand outpacing supply, not a race to the bottom.

4. Price Arbitrage Detection

Classic dropshipping arbitrage: supplier price on AliExpress vs. retail price on Amazon/Shopify. The interesting ratio is usually:

margin_ratio = (retail_price - supplier_price - shipping) / retail_price

Products with a margin ratio of 0.4+ and a rising velocity on both the retail side and the supplier side are strong candidates. Products with high margin but falling supplier velocity are saturated.

Image similarity matching (perceptual hashes like pHash) is how you link an AliExpress listing to the same product on Amazon — titles and SKUs rarely match exactly, but hero images usually do.

5. Shopify Spy via /products.json

For any individual Shopify competitor, scrape /products.json daily and diff. New products that appear with high volume of variants typically signal a store testing a winner. Sort by created_at to see the freshest drops:

def recent_drops(store_url, days=7):
    url = f"{store_url}/products.json?limit=250&page=1"
    r = requests.get(url, headers=headers, proxies=proxies, timeout=30)
    products = r.json().get("products", [])
    cutoff = datetime.datetime.utcnow() - datetime.timedelta(days=days)
    return [
        p for p in products
        if datetime.datetime.fromisoformat(p["created_at"].replace("Z", "")) > cutoff
    ]

Public trending-store lists (Shopify Stars, Stores.watch) give you the store pool. Crawl that pool daily with a rotating mobile IP and a modest per-store request rate.

6. Ethical & Legal Considerations

→
Don't clone listings. Copying titles, descriptions, and photos verbatim is copyright infringement. Use scraped data as a research signal, not as final listing content.
→
Respect robots.txt as intent. Even where scraping is legal (see hiQ v LinkedIn), pay attention to what a site is asking of automated clients.
→
Pace yourself. Aggressive request rates cost small stores real money in bandwidth. Pace at a rate a human browsing the same store would generate.
→
Verify trademark & MAP before listing. A good arbitrage spread doesn't matter if the brand has an active MAP policy or enforced trademark.

Related Guides

E-Commerce

Automate Your Product Research

Carrier IPs that get past AliExpress sliders and Amazon captchas. Rotate on demand. Test it for $5.

Try for $5 View plans →