Google SERP Collection: Proxy Controls, Retries, and Byte Costs

The row count is not the request count

220 keywords, four markets, desktop only, one parser version, daily capture window. The first report logged 880 requests sent. That number was not the one that mattered.

The useful number was 641 accepted SERP rows. The rest was a mix of empty result containers, soft challenges, parser misses, and retries that landed after the window closed. The app counter made the run look cleaner than it was. The provider meter counted every byte regardless of whether the HTML later passed validation.

Log the right fields from the start

The row schema that proved useful: keyword, market, language, device, proxy_line, status, parser_version, result_count, accepted, reject_reason, billed_bytes.

The parser_version column was added after one Google layout change caused the parser to return zero organic results for a keyword group that clearly had results in the screenshot. That miss burned an afternoon. Tagging each row with the parser version makes it trivial to replay the same HTML through a patched parser without re-fetching.

Country routing controls the market, not the full environment

A country token in the proxy username — country-us, country-gb, country-de, country-ca — routes exits through that geography. It does not freeze language, cookies, device signals, location hints, or Google's own A/B testing. Treat one SERP capture as a sample. Movement across repeated captures is more informative than one clean page.

The expanded username on the Proxynade gateway follows this shape: base user + plan token (volume, premium, or datacenter) + optional country-<cc> + optional lifetime-<minutes> rotation window. Example: rt97db6958d9-plan-volume-country-us-lifetime-10. Datacenter lines skip the lifetime token. Credentials are passed to the gateway via the standard Proxy-Authorization header.

Plan choice follows the rejection log

Volume Residential at $0.89/GB is the usual starting point for multi-market SERP work. Premium Residential at $5.00/GB makes sense only if the logs show fewer challenges or more accepted rows on the same keyword set compared to a volume run — paying more for the same rejected HTML is not an improvement. Datacenter is worth a test on simple monitoring jobs where challenge rates in the logs are low.

Run a small sample on each plan tier against your actual keyword set and compare accepted rows per GB billed. That number is the real cost per result, and it changes by target, market, and time of day.

Challenge response: rate first, workers second

When CAPTCHAs, 429s, tiny bodies, or empty containers appear, lower the request rate before adding workers. More concurrency into a throttle just increases rejected bytes and billed cost with no improvement to accepted rows.

Put all rejection types — challenge pages, 429s, empty containers, parser failures — into the same rejection tab and look for the pattern before changing anything. If every proxy plan gets the same tiny body, the parser or the HTTP client is probably wrong. If only one plan fails, then the proxy choice is the variable to test.

Budget from provider-metered GB per accepted row

A retry that ends in a challenge page still crosses the gateway. A rejected HTML body still costs traffic. Consent pages, redirect chains, and font files can accumulate before the parser discards the row. The app-level request counter misses all of it.

The Proxynade dashboard shows host, outcome, latency, and byte totals per request in the network log. Usage logs export as CSV. After each run, pull the CSV and divide total billed bytes by accepted rows. That ratio is what the next capacity estimate should use, not the nominal bytes-per-SERP figure from the HTML alone.

The job comment that prevents the next bad report

The note beside the job now reads: keywords=220; markets=4; accepted=641; parser=serp_v7; retries=2; stop_on_challenge=true. It is not tidy. It is enough to tell whether the next bad report came from Google, the route, the parser, or someone raising concurrency because the run looked slow.

SERP collection checks

Log parser_version with every row so failed parses are replayable.
Budget from billed GB per accepted SERP row, not from request count.
Lower rate before adding concurrency when challenges appear.
Use country-<cc> in the username for market routing; verify with the CSV log.
Export usage CSV after each run and reconcile with the app-level counter.
Test plan tiers on a small keyword sample before scaling spend.

Google SERP collection: proxy controls, retries, and byte costs