Home/Blog/Real Estate Data
Data Collection

How Real Estate Data Is Collected (2026)

Property data powers iBuyers, brokerages, and proptech — but the official APIs have largely closed, the portals run aggressive anti-bot defenses, and the US market is split across hundreds of separate databases. Here's the real picture.

9 min read·Last updated: May 2026

Quick Answer

Real estate data — listings, price history, sold and rental comps — is collected from portals like Zillow, Redfin, and Realtor.com. Zillow retired its public API in 2021 and now runs strong anti-bot defenses, so collectors rely on the RESO/MLS standardization layer, licensed feeds, or careful public-page collection that respects each site's Terms.

  • Zillow's consumer Web Services API (ZWSID) was retired September 30, 2021
  • The US has 580+ separate MLSs, each with its own database and display rules
  • The legal risk for public-page collection is contract/ToS, not the CFAA

Every iBuyer valuation, brokerage CMA, rent-estimate model, and investment screen depends on a fresh view of the market: what's listed, what it sold for, how long it sat, and how prices are trending. That demand drives large-scale collection of property data — and the major portals have made it progressively harder to get.

What's collected

The data behind proptech and investment analytics is consistent across portals:

  • Active listings: price, beds/baths, square footage, photos, description, status
  • Price history & days-on-market: list-price changes and how long a property has been available
  • Sold comps & valuations: closed-sale data and automated estimates like Zillow's Zestimate
  • Rental data: rents, availability, and rental comps for yield analysis

Buyers include iBuyers, brokerages, proptech apps, and real-estate investment analytics teams.

The APIs have largely closed

For about fifteen years, Zillow offered a public consumer API — the ZWSID Web Services API, with endpoints like GetSearchResults and GetZestimate. Zillow retired that consumer API on September 30, 2021. Programmatic access now routes through Bridge Interactive, Zillow Group's RESO Web API program, which is aimed at MLS-affiliated brokerages and approved partners rather than the open public. Redfin offers no broad public listings API. The practical result: developers without partner access turn to MLS feeds or to collecting public pages directly.

Net effect: the easy, free, sanctioned path closed in 2021. What remains is partner/MLS access (gated) or public-page collection (subject to each site's Terms and anti-bot defenses).

The anti-bot reality

Zillow is widely reported to use PerimeterX — now HUMAN Security (the vendor rebranded in 2024), often referred to as HUMAN Bot Defender, alongside CAPTCHA and “verify you are a human” interstitials. These vendor-per-site attributions circulate in scraping blogs and analytics write-ups rather than official portal disclosures, so treat them as reported rather than confirmed by the sites themselves.

For how these systems classify automated traffic across IP, TLS, and behavioral layers, see how websites detect proxies.

MLS fragmentation: 580+ databases

US real estate isn't one database — it's more than 580 separate Multiple Listing Services, each with its own data, field conventions, and IDX/VOW display rules governing how listings can be shown and used. The RESO Web API is the standardization layer that, by 2026, normalizes much of this across MLSs. The fragmentation is exactly why regional access matters: completeness and display rules vary market to market.

The legal reality

We did not find a headline real-estate scraping lawsuit on the scale of the airline or LinkedIn cases — though the absence of one in our research is not proof none exists. The governing precedent for publicly accessible data is hiQ Labs v. LinkedIn: scraping public data is generally not a Computer Fraud and Abuse Act violation, but a site's Terms of Service can still create contractual and trespass-to-chattels liability.

The practical takeaway: for real-estate portals, the real exposure is contract/ToS, not hacking law. Read each portal's Terms, respect IDX/VOW rules where MLS data is involved, and prefer licensed/partner feeds for commercial products. See is web scraping legal?

Where mobile & geo IPs fit

Because listings and display rules are regional across 580+ MLSs, collecting an accurate national picture means requesting from the right markets. Geo-distributed mobile IPs let location-aware collection observe market-specific public displays and spread load so a single region isn't rate-blocked.

This is infrastructure for legitimate, geo-distributed research — it doesn't override a portal's Terms, IDX/VOW rules, or rate limits. Honor those, throttle your rate, and license data where a commercial product requires it.

Sources

Related Guides

Collect property data the right way

Geo-distributed 4G/5G IPs for legitimate, market-accurate public-data research. Test it for $5.