Bare-metal notes

Bare metal for scraping: when dedicated hardware changes the numbers

A dedicated box will not save a bad route. It can stop the host from confusing the test.

Playwright density Proxy bandwidth Updated 2026-06-12

Bare metal just means no hypervisor neighbor

Bare metal scraping means the scraper runs on a physical server. CPU, RAM, disk, and the NIC are not time-shared with random neighbors through a hypervisor. That matters less than sellers imply — until it matters a lot.

Small HTTP jobs can sit on a cheap VM indefinitely. A cron job hitting a public JSON endpoint does not need a dedicated server. The problem starts with browser work: Playwright, Puppeteer, screenshots, page scripts, local profile writes, redirects, cookies, and target pages that keep loading assets after the parser already has what it needs.

That kind of scraper makes every weak layer look like a proxy problem. A page times out, so the proxy gets blamed. Chrome exits, so the residential pool gets blamed. The app saves fewer rows, so someone buys a more expensive route. Sometimes the route is bad. Sometimes the host was wobbling under browser load.

Test one route first. If the result is bad, scaling hides which part failed and makes the provider meter harder to explain.

What a run looks like when the host stops being the variable

The following is a scratch note from a retail search crawl — same code branch, same residential plan, target names removed. Not a lab benchmark. The kind of note that stops a later argument about whether the proxy pool changed.

retail search crawl, late night
same code branch, same residential plan
target names removed

vps run
  18 workers felt okay
  24 workers: launch times started walking upward
  panel CPU: 58-71 percent, looked harmless
  chrome exits: enough to poison the queue
  proxy meter kept climbing while saved rows barely moved

bare box run
  28 workers stayed boring
  rss cap needed tuning, not surprising
  queue age moved with target weight
  fewer "proxy timeout" rows after browser recycling was fixed

annoying part:
  app said ~3 GB
  provider meter said closer to 8 GB
  app only counted accepted bodies, not browser junk

The exact worker count is not the point. Different sites, RAM, Chrome flags, screenshots, and page weight will move that number. The useful part is that the dedicated server made failure easier to read. Once the host stopped changing shape underneath the scraper, the remaining work was ordinary tuning: cap RSS, recycle browsers earlier, split render queues from parse queues, stop retrying pages that were already challenge pages.

Virtualization hides signals a browser scraper needs

Virtualization is not the villain. It is another layer that hides useful signals. CPU steal is the obvious one. Disk wait is another. Virtual networking can add small latency spikes that look exactly like a weak route. Memory pressure is worse: Chromium can die before the guest OS produces a clean out-of-memory story.

A Playwright scraping server is not fetching HTML. It is running a browser runtime, writing profiles, executing JavaScript, carrying cookies, pulling fonts and images unless blocked, and leaving cleanup work behind. A few seconds of host jitter can turn into retries, and retries turn into real proxy spend.

Dedicated server scraping removes that hidden jitter. It does not guarantee better completion rates against a given target. It gives you a machine whose behavior is stable enough to compare against proxy logs.

The bandwidth gap between dashboards is real

Scraper dashboards often count useful output: saved rows, accepted JSON, stored HTML. Proxy providers meter what crossed the proxy. Those are not the same number.

A blocked 200 page still moved bytes. A redirect chain moved bytes. Fonts, images, scripts, websocket reconnects, TLS overhead, retries, and discarded challenge pages moved bytes. If the parser throws away the page, the app may record nothing. The provider meter does not care — it saw the traffic.

That gap gets expensive on residential traffic. Volume Residential at $0.89/GB can look cheap until browser waste eats the balance. Premium Residential at $5.00/GB can be worth it on a harder target, but only if the logs show better accepted output per metered GB. If the app says 3 GB and the provider says 8 GB, trust the bill and find the discarded traffic.

Host problems and proxy problems need separate tests

Separate the host from the provider before changing plans. If a proxy route is noisy enough that retries dominate the run, moving the worker from a VPS to a dedicated server will not make that route clean. If a datacenter range is blocked by the target, the CPU model is irrelevant.

The reverse is also true. A good residential pool looks terrible when the browser host is unhealthy. Crashed contexts, duplicate queue pulls, bad restart rules, and unbounded retries burn proxy bandwidth without producing more rows. Prefer boring comparisons: same proxies on two hosts, same host with two worker caps, same browser profile before and after a recycling change.

Signs the host is worth removing from the equation

Consider bare metal when completion rate stops improving as workers are added, browser exits rise during normal load, queue age drifts upward over long runs, or proxy spend grows faster than saved output. None of those signs prove the VPS is guilty. They make shared infrastructure worth removing from the test.

For Proxynade, the practical setup is straightforward: keep the proxy plan choice separate from the server choice. Use datacenter routes where speed and unit cost matter. Use residential routes where the target requires it. Put browser-heavy workloads on hardware that can hold steady worker limits. Then watch host metrics next to proxy-metered bytes — not in two dashboards opened after the damage is done.

The best reason to use bare metal is not that it sounds stronger than a VPS. It is that a stable host makes the boring numbers line up: browser count, RSS, disk wait, queue age, challenge pages, retries, accepted rows, and metered bytes. Once those numbers tell the same story, scraping problems stop looking mystical.