Hardening the open CORS proxy — allowlists, SSRF guards, and the bypass I almost left behind
A little over a year ago I wrote up a serverless CORS proxy on
Vercel — a single function that takes ?url=<target>,
fetches it server-side, and hands the response back to the browser with the CORS
headers the upstream API never sent. It solved a real, recurring problem across my
nonprofit side projects, and that post ended on an honest caveat:
The proxy is currently open without authentication, making it accessible to anyone. For production applications… deploying a private instance with additional security measures is recommended.
That throwaway line was the whole story. “Open without authentication” is a polite
way of saying open relay: the proxy would forward any ?url= to any host
and stamp Access-Control-Allow-Origin: * on the way back. This post is how I
closed that hole — and the diagram below is the data flow I ended up with.
What “open” actually meant
Two properties made the original proxy a liability, not just a convenience:
- Any target.
?url=https://anythingwas forwarded verbatim. That turns the proxy into an anonymizing relay for whoever finds it, and — worse — into an SSRF vector: a request forhttp://169.254.169.254/…(the cloud metadata endpoint) orhttp://10.0.0.5/…(an internal host) is made from Vercel’s network, not the attacker’s. The proxy was a confused deputy. ACAO: *unconditionally. Every response advertised that any web origin could read it. TheALLOWED_ORIGINSenv var I’d documented in the README was never actually read by the code.
None of this was hypothetical. When I later pulled the production runtime logs, a
scanner was already hammering ?url=<random host> looking for exactly this kind
of open relay.
The shape I wanted
Before touching code I drew the target: a request should pass a short stack of
gates before any upstream fetch happens, each able to reject early, and the
response should carry a scoped CORS header — the matched origin, never *. Every
figure on this blog is compiled from TikZ at build time, so the diagram below is
the actual source, not a screenshot:
Read it left to right: a browser request enters, passes the origin and target
gates, the rate limiter, and the cache, and only then reaches resilientFetch,
which talks to the one or two upstreams the proxy is actually for. A failed gate
drops straight down to a 403/429 instead of leaking a generic 500. The
greyed-out path along the bottom is the part I almost missed — more on that below.
Layer 1 — the lockdown (allowlists + SSRF guard)
The first change made the proxy closed by default:
- Target allowlist.
ALLOWED_TARGETSdefaults to exactly the two hosts my apps proxy —projects.propublica.org(the IRS-990 / Nonprofit Explorer API) andcollectionapi.metmuseum.org(the Met’s collection API). Anything else gets a clean403 Target host is not allowed, not a fetch. - Origin allowlist.
ALLOWED_ORIGINSdefaults tonoprofits.organd its subdomains (pluslocalhostfor dev), and the response now echoes the matched origin rather than*. The env var the old README promised is finally wired in. - HTTPS-only + an always-on SSRF guard that blocks
localhost, RFC-1918 private ranges, link-local, and the cloud-metadata addresses even if someone overrides the target allowlist. Defense that an env var can’t switch off.
Setting ALLOWED_TARGETS=* / ALLOWED_ORIGINS=* restores the old open-relay
behavior for anyone who forks it and actually wants a general proxy — but you have
to opt into that now, loudly.
Layer 2 — resilience
With the security boundary in place I added the reliability layer the diagram’s last two boxes represent, because a proxy that’s a single point of failure for several apps should fail gracefully:
resilientFetch— a per-attempt timeout (anAbortControllerat 8 s) with retry and exponential backoff + jitter for network blips,429s, and5xxs, all bounded by a 9 s overall budget that stays under Vercel’s 10 s function limit.4xxs are never retried; a429with a smallRetry-Afteris honored. Timeouts surface as a504and other upstream failures as a502, instead of a misleading500.- An LRU + TTL cache for deterministic
2xxGETs, tagged withX-Proxy-Cache: HIT|MISS. That’s the green short-circuit in the diagram — a hit never touches the upstream. - A fixed-window per-IP rate limiter (60/min, with
X-RateLimit-Remaining), the429rail. - A
/api/healthendpoint that reports whether the allowlists are locked.
The cache and limiter live per warm serverless instance — zero extra infrastructure. A globally-shared version would need something like Vercel KV; per-instance was the right trade for this traffic level.
Layer 3 — the bypass I almost left behind
Here’s the part worth the price of admission. I’d hardened api/proxy.js, shipped
it, verified the allowlist returned 403 for a bad host. Done, I thought.
It wasn’t. The hardening went in as a fresh file, but the original repo still
contained sibling endpoints from the proxy’s debugging days —
api/debug-proxy.js, api/env-test.js, an in-memory logging stack, a dashboard.
And Vercel file-routes every api/*.js automatically. debug-proxy.js was an
older, unguarded copy of the proxy — ACAO: *, no allowlist, no SSRF check — and
it was live in production, sitting right next to the locked-down one.
I caught it by probing the deployed endpoints rather than trusting the diff:
GET /api/debug-proxy?url=https://example.com -> 200, ACAO:* (open relay!)
GET /api/proxy?url=https://example.com -> 403 (correctly blocked)
The hardened front door was bolted; the side door was wide open. The fix was to
delete the entire dead surface — the debug and test endpoints, the logging stack,
the dashboard — and strip the few logging calls proxy.js still made into it.
That’s the crossed-out path along the bottom of the diagram. Re-probing afterward,
both debug-proxy and env-test returned 404, while the real proxy kept
returning 403 for bad hosts and 200 for allowed ones.
The lesson: when you harden something by writing a new clean version rather than editing the vulnerable file in place, go hunting for the old siblings. A framework that turns every file in a directory into a public route will happily keep serving the one you forgot.
Verifying against a real consumer
The satisfying part came from the production logs. Within minutes of the lockdown,
that background scanner’s ?url=<random host> probes were all returning 403 —
the open-relay abuse, shut. And when I loaded one of the apps that actually depends
on the proxy (the grants visualizer), its requests
sailed through: 200s and 304s against projects.propublica.org, each carrying
Access-Control-Allow-Origin: https://grants.noprofits.org — the scoped header, not
*. The allowed path works; everything else is turned away at a gate.
The proxy is still open source, still a single small function, still free to fork. It’s just no longer an open door with my name on it.