Real estate proxies

Scraping real estate listings with proxies

Confirm scope, split the crawl, pick a proxy type, then read the bandwidth meter — in that order.

Field notes Setup checks Updated 2026-06-12

A proxy is routing, not permission

If there is an MLS feed, IDX feed, partner API, or account export, use that first. If the source allows public pages, stay inside that scope. A proxy controls where the request exits, not whether you are authorized to make it.

The actual job is narrower than "download every listing." Keep a record of what changed: listing URL, price, status, address fields, agent or brokerage name, open house time, photo count, and the day you checked. The scope question must be settled before choosing a proxy type — the proxy type does not change the answer.

Split the crawl before arguing about scale

One job finds listing URLs for a city, ZIP code, or neighborhood. Another job checks detail pages. A third sanity step catches wrong-region pages before they land in the export.

Listing pages repeat homes across search cards, map cards, saved filters, sponsored rows, and nearby-result blocks. If a 78704 test run reports 2,400 rows with only 318 unique URLs, that is not a sudden condo boom — it is duplicate cards getting counted as inventory. Dedupe on listing ID or URL before counting, before retrying, and before choosing how much bandwidth to buy.

Proxy type follows the crawl structure, not the reverse

Broad permitted discovery fits rotating residential traffic. A filter flow that depends on cookies needs a sticky session. A simple allowed feed or plain server-rendered HTML page can be tested with datacenter proxies if the target accepts them. Static ISP proxies fit allowlisted or account-bound workflows where the exit IP is known in advance.

On a Proxynade pool the routing lives in the username. For country-scoped residential traffic use the expanded form: rt97db6958d9-plan-volume-country-us-lifetime-30 — base username, plan token (volume, premium, or datacenter), optional country-<cc>, optional lifetime-<minutes> rotation window. Datacenter lines skip the lifetime token. The gateway is http://proxynade.net:2555 with the password unchanged.

Buying the expensive plan first just hides parser problems longer. Run a small sample before committing bandwidth.

The first test is boring on purpose

One search result page, one active listing, one pending or sold listing, one empty filter result, and one page with an odd layout. Austin 78704 condos is a fine sample set. Save the raw HTML before writing retry logic. Debugging against a page that changed yesterday wastes the session.

Scrapy handles server-rendered HTML and allowed feeds. Playwright is useful when the page renders late, but it pulls more asset weight through the proxy. Block images, fonts, map tiles, and extra scripts only after confirming the data fields still appear — there is usually one value hiding behind the script you assumed was safe to drop.

The bandwidth meter counts more than the database does

Your stored row might be listing ID, URL, price, status, timestamp, market, and parser version. The proxy meter counted photos, map tiles, fonts, redirects, failed pages, scripts, retries, and partial responses to produce that row.

Ten thousand pages at 150 KB each is a 1.5 GB napkin estimate, not a billing number. A browser crawl that loads photos, map tiles, tracking scripts, and repeated error pages runs past that fast. If the Proxynade dashboard usage log jumps, check assets, redirects, retries, and error pages before buying more bandwidth. The dashboard shows host, outcome, latency, and byte totals per request, and usage logs export as CSV.

Export fields that survive a later clean

Store listing ID when it exists, saved URL, market, price, status, timestamp, parser version, and page type. Missing price, empty card, wrong region, and sudden photo-count changes belong in an error bucket, not quietly treated as "no listing found." The extra fields cost almost nothing at write time and save a full re-crawl later.

Real estate proxy FAQ

Which proxy type fits real estate listing scrapes? Rotating residential for broad permitted discovery, sticky residential for cookie-dependent filter flows, datacenter for simple allowed feeds, and static ISP for account-bound workflows.

Why do my listing counts look wrong? Listing pages repeat the same home in search cards, map cards, saved filters, sponsored rows, and nearby-result blocks. Dedupe on listing ID or URL before counting.

Why is my bandwidth bill higher than the page size suggests? The proxy meter counts photos, map tiles, fonts, redirects, failed pages, scripts, retries, and partial responses — not just the HTML rows you stored.

How do I set a country exit for a specific market? Add country-<cc> to the Proxynade username, for example rt97db6958d9-plan-volume-country-us-lifetime-30. The pool routes exits from that country for the session window.

When does a sticky session make sense here? Any filter or search flow that sets a session cookie to hold market scope needs a sticky session. Use lifetime-<minutes> in the username token to hold the same exit for that window.

Production checks

  • Confirm scope before choosing a proxy type.
  • Dedupe on listing ID or URL before counting or retrying.
  • Save raw HTML before writing retry logic.
  • Block assets only after confirming data fields still appear.
  • Store parser version with every export row.
  • Log proxy label alongside status code and byte count.