Public Marketplace Data Collection: Proxy Setup and Byte Control

Scope before you connect

Public listing pages, category pages, seller storefront pages, and regional price checks are one job. Login pages, checkout flows, wishlists, and personalized recommendations are a separate job that falls outside the scope of this setup. Keep them in separate runs with separate routing labels, and stop if the target or provider policy marks a category off-limits.

The row schema worth defining before the first request: listing ID, seller ID, marketplace, market, price, stock state, timestamp, parser version, route label. Deduplication runs against that key before the retry queue executes — running it afterward turns a key bug into a billing problem.

Wire the connection string to match the storefront

The pool gateway is http://proxynade.net:2555 with username and password auth. The username carries routing options inline: base user, a plan token (volume, premium, or datacenter), an optional country code, and an optional rotation lifetime in minutes. For marketplace work, country routing should match the storefront being measured — price, availability, shipping copy, and seller order can all differ by region.

# Volume residential, US storefront, 10-minute sticky session
PROXY_HOST=proxynade.net
PROXY_PORT=2555
PROXY_USER=rt97db6958d9-plan-volume-country-us-lifetime-10
PROXY_PASS=your_password

# In Python requests:
proxies = {
    "http":  f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}",
    "https": f"http://{PROXY_USER}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}",
}
response = requests.get(url, proxies=proxies, timeout=30)

A generic residential rotation without a country token is fine for a broad public sweep (see Python Requests: proxies for the full dict format). Add country-<cc> when the job needs a specific regional storefront. Drop lifetime-<minutes> when the job is per-listing stateless; keep it when a multi-step flow needs to stay on one exit. Datacenter lines skip the lifetime token entirely.

Read status codes before adding routes

Three codes do most of the diagnostic work on a marketplace run.

Code	Source	Next check
`407`	Proxy — authentication failed (MDN: 407 Proxy Authentication Required)	Verify username format, password, and account balance. Fix the line before retrying.
`403`	Target — request refused	Check headers, pacing, and session age. The proxy connected; the target blocked.
`429`	Target — rate limit	Pacing or retry shape is wrong. Slow the queue; don't add more exits as a first move.

A run that returns mostly 403 and 429 does not need more proxy volume. It needs different request pacing or headers. Adding routes to a misconfigured run scales the cost, not the yield.

The proxy meter counts more than your parser does

The dashboard network logs show host, outcome, latency, and byte totals per request. Those numbers will always exceed your scraper's local byte counter. The proxy meter sees redirect chains, blocked HTML pages, discarded response bodies, image carousels, and retries. Your parser counts only what it kept. Both numbers are accurate; they measure different things.

One run cost more than expected because UTM parameters were not stripped before the dedup key was built. The scraper treated the same public listing as a new item on every tracking-string variant, and the retry queue fired on each one. The proxy logs showed what looked like a pacing problem. It was a bad key. The canonicalizer ran after the retry queue — the wrong order, obvious in retrospect.

Compare provider bytes against accepted unique listings, not against raw request count. A run that hits ten thousand requests can collect almost nothing new if the dedup key is wrong.

Block waste assets only after a clean sample

Blocking tracking pixels, analytics scripts, social widgets, and non-product media reduces bytes. Do it after confirming that listing fields still parse correctly without those resources — some marketplaces render price or stock state via JavaScript that depends on assets you might block prematurely.

The right order: get a clean sample of the fields you need, then identify which asset categories contribute bytes without contributing data, then block them and recheck the sample.

What to log beside each request

Log the route label with every request. When the dashboard shows a spike in bytes or latency on a specific host, the route label in your own logs lets you trace it to the exact job, parser version, and country token that caused it. Without the label, the dashboard number is hard to act on.

The minimum useful log row: listing ID, status code, route label, bytes transferred, latency in milliseconds, timestamp, parser version. The dashboard can export usage logs as CSV if a run produces data you need to reconcile offline.

Marketplace proxy FAQ

Why does my local byte counter read lower than the proxy dashboard? The proxy meter counts everything transferred: redirect chains, blocked HTML pages, discarded response bodies, image carousels, and retries. Your scraper counts only what the parser kept. Both numbers are real; they measure different things.

When should I add a country code to the proxy username? When the marketplace shows different prices, availability, or seller ordering by region. A generic residential rotation is fine for a broad sweep; add country-<cc> to the username when the data needs to reflect a specific storefront.

What does a 407 mean on a marketplace run? 407 is a proxy authentication error, not a target block. Check the username format, password, and account balance before blaming the marketplace.

What is the right metric for proxy efficiency on marketplace collection? Provider bytes divided by accepted unique listings. A run with many duplicates can show high request volume while collecting almost nothing new.

Pre-run checklist

Dedup key defined; canonicalizer runs before retry queue.
Country token matches the target storefront.
Route label logged with every request.
First clean sample confirmed before blocking assets.
Efficiency measured in bytes per unique listing, not request count.

Public Marketplace Data Collection Proxy Setup