How grants.noprofits.org works: tracing federal money through a live graph
The tool at grants.noprofits.org answers one question: when the federal
government hands money to an organization, where does it come from, and where
does it go from there? You type an org — a hospital, a university, a state health
department — and it pulls that organization’s federal awards live from
USAspending.gov, walks outward a couple of hops,
and draws the result as a force-directed money-flow graph. Each recipient node is
then enriched with its IRS Form 990 financials from
ProPublica, so the inspector
can show what fraction of an organization’s revenue is federal grant money. The
real point isn’t the pretty graph — it’s making taxpayer money legible: you can
see that a sub-agency you’ve never heard of is the actual funder, and that a
nonprofit you have heard of runs largely on federal dollars.
This post is a tour of how it’s built. The source of truth is the
noprofits-org/grants repo, and most
of the interesting decisions are recorded there as (#NN) comments tied to a real
bug or a wrong first attempt. Those are the parts worth reading, so that’s what
I’ve pulled out here.
Architecture at a glance
There’s no build step and no server. index.html loads ES modules directly in
the browser:
index.html # the live tool
├── live-main.js # app controller: search, render, inspector, state
├── usaspending.js # USAspending.gov client → {grants, charities, connected}
├── flow-graph.js # D3 force-directed renderer (the visualization engine)
├── propublica.js # IRS-990 enrichment (taxpayer rings + inspector)
└── live.css # design tokens
The separation that matters is between the rendering engine and the data
adapter. flow-graph.js is deliberately data-shape agnostic: it renders any
directed, weighted, entity→entity flow graph. It knows nothing about federal
grants. usaspending.js is one adapter that happens to feed it federal data; you
could write another adapter against a different source and the renderer wouldn’t
change. The contract between them is a single object shape that
usaspending.js emits and everything downstream reads:
- grants — the edges:
{ filer_ein, grant_ein, grant_amt, tax_year } - charities — the nodes:
{ filer_ein, filer_name, receipt_amt, govt_amt, … }, where thefiler_einfield is an id with a type prefix (A:for an agency,R:for a recipient) - connected — the set of node ids that survived trimming and are actually rendered
The field names (filer_ein, tax_year) are a tell: this shape predates the
live tool — it came from an earlier 990-based version where the nodes really were
990 filers with EINs. The live adapter keeps the shape and overloads it, which is
why an “EIN” here is actually a normalized agency or recipient name. Reusing the
contract meant the renderer didn’t have to be rewritten when the data source
changed.
Pulling the graph from USAspending
This is the heart of the tool. usaspending.js builds the graph with a bounded
breadth-first search (BFS) outward from the org you searched for. The root goes on
the frontier at depth 0; each hop expands the frontier by querying USAspending for
each node’s awards, registers the new endpoints, and stops at a configured depth:
let frontier = [{ ...root, depth: 0 }];
const expanded = new Set();
while (frontier.length > 0) {
const expandable = frontier.filter(n => n.depth < depth && !expanded.has(n.id));
if (expandable.length === 0) break;
const batches = await Promise.all(expandable.map(async node => {
expanded.add(node.id);
return node.kind === 'recipient'
? await this.recipientEdges(node.name, years)
: await this.agencyEdges(node.name, years, perAgencyFanout, node.tier || 'toptier');
}));
// …register endpoints, aggregate edge amounts, build the next frontier…
}Each level’s expansions fire concurrently with Promise.all. A recipient node
gets expanded by asking “who funded this org?”; an agency node by asking “who did
this agency fund?”. Everything past that point is about making the resulting graph
correct and readable, and each of those is a decision with a story.
Group by the sub-agency, not the department (#30)
The first instinct is to group a recipient’s inbound awards by the awarding agency. That’s wrong, and it’s wrong in a way that destroys the whole point of the tool. If you collapse everything to the top-tier department, every grant from the National Institutes of Health, the Health Resources and Services Administration, and the Centers for Medicare & Medicaid Services becomes one undifferentiated “HHS” arrow. But “HHS” isn’t who funded you — NIH did. The sub-agency is the funder:
// A recipient's inbound awards, grouped by the funding SUB-agency (NIH,
// HRSA, CMS…). Grouping by the top-tier "Awarding Agency" collapses every
// HHS sub-agency into a single "HHS" inflow, which misrepresents the funding
// picture (see #30); the sub-agency is the real funder. Awards with no
// sub-agency fall back to the top-tier department.
const sub = r['Awarding Sub Agency'];
const agency = sub || r['Awarding Agency'];
const tier = sub ? 'subtier' : 'toptier';Notice the tier that rides along. USAspending’s filter API distinguishes
subtier from toptier agencies, and you have to query a node against the tier it
was minted at. If you mint “HRSA” as a sub-tier node and then, on the next BFS hop,
query it against the top-tier filter, USAspending matches nothing and the agency’s
fan-out silently vanishes. So the tier is stored on the node and threaded back into
the next query in agencyEdges.
One award group per query, or you get a 422
The query filters on award_type_codes, and there’s a constraint hiding in that
list:
// USAspending splits award types into groups (grants, loans, contracts, ...)
// and ONE spending_by_award query may only use codes from a single group, else
// it 422s. The "grants" group is exactly these four — the natural identity for
// this app. (Block Grant, Formula Grant, Project Grant, Cooperative Agreement.)
const AWARD_TYPE_CODES = ['02', '03', '04', '05'];You can’t mix grant codes and contract codes in one spending_by_award call —
USAspending returns a 422. Codes 02–05 are the entire grants group, which is
exactly what this tool is about, so the constraint and the scope happen to line up.
Normalize names so one org is one node (#15)
Award data is messy. The same recipient shows up as "FRED HUTCH " in one record
and "Fred Hutch" in another, and if you key nodes on the raw string you get two
nodes for one organization and the graph fragments. Node ids key on a normalized
name instead:
// Node ids key on a NORMALIZED name (case-folded, whitespace-collapsed) so
// the same entity under different spacing/casing — "FRED HUTCH " vs "Fred
// Hutch" — resolves to ONE node instead of fragmenting (#15). The raw name
// still rides on every edge's sourceName/targetName for display. A stable
// id (recipient UEI) would also disambiguate genuinely distinct same-named
// orgs, but that needs the root resolved to a UEI first — deferred.
normKey(name) { return (name || '').trim().toLowerCase().replace(/\s+/g, ' '); }
agencyId(name) { return 'A:' + this.normKey(name); }
recipientId(name) { return 'R:' + this.normKey(name); }The normalized string is only the id; the original display name still travels on every edge, so the graph reads “Fred Hutch” even though both spellings collapse to the same node. The comment also flags the limitation honestly: case-folding merges genuinely-distinct orgs that share a name. The real fix is a stable identifier (USAspending’s Unique Entity Identifier), but that requires resolving the root to a UEI first — deferred, and noted as such.
Trim by proximity first, then by dollars (#37)
A two-hop BFS over federal data produces far more nodes than you want on screen, so
the graph trims to a maxOrgs budget. The obvious ranking — keep the biggest
dollar nodes — turned out to be exactly wrong, and the comment explaining why is my
favorite in the repo:
// Trim to maxOrgs, always keeping the root, ranking by PROXIMITY first
// then dollar volume (#37). Volume-only ranking dropped the root's own
// direct funders (e.g. HRSA at depth 1) in favour of billion-dollar
// nodes several hops away, and trimmed small connector nodes — which,
// with the both-endpoints-survive edge rule, stranded whole far clusters
// (a floating CMS->states blob) with no path back to the root. Keeping
// shallower nodes first means a direct funder is never cut for a distant
// one, and the keep-set stays a prefix-by-depth of the BFS.
let keep = new Set(meta.keys());
if (keep.size > maxOrgs) {
const ranked = Array.from(meta.entries())
.filter(([id]) => id !== root.id)
.sort((a, b) => {
const da = connected.get(a[0]) ?? Infinity, db = connected.get(b[0]) ?? Infinity;
return da - db || (b[1].inflow + b[1].outflow) - (a[1].inflow + a[1].outflow);
})
.slice(0, Math.max(0, maxOrgs - 1))
.map(([id]) => id);
keep = new Set([root.id, ...ranked]);
}connected.get(id) is the node’s BFS depth. The sort compares depth first
(da - db) and only breaks ties by total dollar flow. So a depth-1 direct funder
of the org you searched always outranks a depth-2 giant. Ranking by dollars alone
literally dropped the root’s own funders — HRSA, sitting right next to the org you
asked about — in favor of a billion-dollar node two hops out that you didn’t care
about.
Never render a floating blob (#37)
Trimming creates a second hazard. An edge survives only if both its endpoints survive the trim. Cut a small connector node and you can sever the only path between a distant cluster and the root, leaving an island — a “floating CMS→states blob” with no visible connection to anything. The fix is to prune the kept set down to the connected component that actually contains the root:
// Prune to the root's connected component so no disconnected cluster is
// ever rendered (#37). Undirected walk — a recipient root reaches its
// funder agencies by traversing edges backward.
const adj = new Map();
for (const id of keep) adj.set(id, []);
for (const { src, tgt } of keptEdges) { adj.get(src).push(tgt); adj.get(tgt).push(src); }
const reachable = new Set([root.id]);
const queue = [root.id];
while (queue.length) {
for (const nb of (adj.get(queue.shift()) || [])) {
if (!reachable.has(nb)) { reachable.add(nb); queue.push(nb); }
}
}
keep = reachable;The walk is deliberately undirected. Money flows agency→recipient, but the org you searched for is usually a recipient, and it reaches its funders by traversing those edges backward. A directed walk from a recipient root would reach nothing. Anything not reachable from the root is dropped before render, so an island is never drawn.
Enriching with 990 data and taxpayer rings
Once the graph is on screen, live-main.js enriches every recipient in the
background with ProPublica’s 990 data. This is what powers the inspector’s
financials and the “taxpayer ring” that flags orgs heavily dependent on federal
money. Two disciplines govern this whole layer: never attach the wrong 990, and
never let an enrichment failure break the graph.
The match has to be right, or it’s misinformation (#23)
ProPublica’s search is fuzzy. Search “Fred Hutch” and you’ll get the real
organization plus a scattering of similarly-named or chapter/affiliate orgs. If you
naively take the first result, you can attach the wrong organization’s financials —
and in a tool whose entire purpose is showing where taxpayer money goes, a wrong
990 isn’t a cosmetic bug, it’s misinformation. So bestMatch gates candidates
before ranking them:
// Gate (per candidate, both STOP-stripped): keep only candidates that
// (a) overlap ≥60% of the smaller token set, AND
// (b) contain the query's MOST DISTINCTIVE token (longest, as a rare-token
// proxy) — this kills cross-domain false accepts that share only a common
// word, which the bare 60% ratio let through.
const Q = new Set(toks(query));
if (!Q.size) return null;
const distinctive = [...Q].reduce((a, b) => (b.length > a.length ? b : a));The 60%-overlap test alone wasn’t enough: two names sharing only a common word (“Foundation”, “Health”) could clear it. The distinctive-token requirement — the query’s longest token, used as a cheap proxy for “rarest” — has to appear in the candidate, which kills those cross-domain false accepts. Among survivors, the ranking prefers an exact token-set match, then the most shared tokens, then the fewest extra tokens — that last tiebreak favors the parent organization over a “…of Anytown Chapter” variant that would otherwise outrank it. The matched name is returned alongside the EIN so the inspector can flag when it differs from the recipient’s name on the graph.
The CORS proxy and best-effort-null
ProPublica’s API sends no Cross-Origin Resource Sharing (CORS) headers, so the browser can’t call it directly. Every request goes through the org’s shared Vercel proxy. The more important discipline is what happens on failure:
async resolveEin(name) {
const key = name.trim().toLowerCase();
if (this.einCache.has(key)) return this.einCache.get(key);
let d;
try {
d = await this.proxied(`${BASE}/search.json?q=${encodeURIComponent(name)}`);
} catch (e) {
console.warn('propublica search failed', name, e);
return null; // transient — do not cache
}
const result = bestMatch(name, d.organizations || []);
this.einCache.set(key, result);
return result;
}Every enrichment is best-effort and resolves to null on any failure — an agency
isn’t a nonprofit and has no 990, and a network blip is just a null too. The
graph is already drawn; enrichment only ever adds to it, so a failed 990 lookup
leaves the node exactly as it was rather than breaking the render. There’s a subtle
caching rule in there too: a deterministic result (a real match, or a clean
“nothing matched”) gets cached, but a transient failure returns null without
caching, so a later interaction retries instead of being stuck on a momentary blip.
Don’t divide multi-year money by one year’s revenue (#28)
The taxpayer ratio is the number the whole tool builds toward: what fraction of an organization’s revenue is federal grant money? The naive computation divides the graph’s federal inflow by the org’s 990 total revenue — and that can come out over 100%, which reads as a data error and undermines trust in the figure. The bug is a units mismatch: the graph sums inflow across all selected fiscal years, but a 990 reports one year’s revenue.
// Taxpayer ratio normalized to the 990's fiscal year (#28). The graph
// sums federal inflow across ALL selected years, but a 990's revenue is
// one year — dividing the two can exceed 100% and read as a data error.
// Compare only the 990 year's federal inflow to that year's revenue, and
// only when the 990 year is within the selected fiscal years (so it
// stays consistent with what's on screen).
if (!isAgency && profile && profile.revenue && profile.year != null && this.years.includes(profile.year)) {
ratioYear = profile.year;
const entry = this.yearInflows.get(n.id);
if (entry && entry.year === ratioYear) {
yearInflow = entry.inflow;
share = yearInflow / profile.revenue;
}
}The fix restricts the numerator to the same fiscal year the 990 covers, and only
computes the ratio when that year is among the years on screen, so the percentage is
consistent with the graph the user is looking at. The graph ring, on the other
hand, intentionally uses the coarse all-years signal against a 0.05 threshold —
it’s a binary “this org leans on federal money” highlight, not a stated figure:
const TAXPAYER_THRESHOLD = 0.05; // federal grants / total revenue → rust ring + alertThe ring and the inspector number deliberately differ: one is a cheap visual flag, the other is a precise, fiscal-year-normalized percentage you can quote.
Timeouts so a hung request never strands the UI (#13, #22)
enrichAll fires a dozen or two of these concurrently per Visualize. A single
hung proxy connection used to strand that node’s ring on “loading” forever (#22),
and on the USAspending side a hung request would leave the loading overlay spinning
(#13). Both sides cap every request with an AbortController:
async proxied(target, { timeout = 10000 } = {}) {
const ctl = new AbortController();
const t = setTimeout(() => ctl.abort(), timeout);
try {
const res = await fetch(`${PROXY}?url=${encodeURIComponent(target)}`, { signal: ctl.signal });
if (!res.ok) throw new Error('proxy ' + res.status);
return await res.json();
} catch (e) {
if (e.name === 'AbortError') throw new Error('proxy timeout');
throw e;
} finally {
clearTimeout(t);
}
}A timeout surfaces as an ordinary error, which the best-effort-null discipline above already handles — so a slow request degrades to a missing enrichment, never a stuck spinner.
Rendering
flow-graph.js stays small because it’s the generic engine. It derives a visual
role for each node from the shape alone:
roleOf(n) {
if (n.id === this.focusId) return 'focus';
return n.kind === 'agency' ? 'govt' : 'grantee';
}The node you searched for is the focus; agencies are govt; everyone else is a
grantee. In federal data there’s no such thing as a non-government funder, so the
design’s neutral “funder” color is unused and agencies read in rust instead — the
color carries the meaning “this is public money.”
One small but necessary detail: the colors live in CSS as design tokens, but SVG
fill and stroke attributes can’t take var(--…). So the renderer resolves each
token to a concrete color once, via getComputedStyle, and caches the palette:
// resolved to concrete colors here because SVG attributes (fill/stroke)
// can't take var() (#18).
function cssToken(name) {
return getComputedStyle(document.documentElement)
.getPropertyValue('--' + name).trim();
}That keeps a single source of truth for color across the whole suite — the stylesheet — while still feeding SVG the literal hex strings it requires. The renderer also offers two layouts off the same data: a force-directed view and a depth-columnar hierarchy view, since the BFS depth is already on every node.
What’s deferred, and the point
Two honest limitations. The first is in that normalization comment: case-folding names merges genuinely-distinct organizations that happen to share a name, and the real fix — keying nodes on USAspending’s stable Unique Entity Identifier — is deferred because it needs the root resolved to a UEI first. The second is that the fiscal-year-normalized taxpayer ratio is still an approximation: it lines up the 990’s reporting year with the awards in that year, but award timing and 990 fiscal years don’t always align cleanly, so the percentage is a good estimate rather than an audited figure.
Neither of those changes what the tool is for. Federal money moves through a structure that’s genuinely hard to see: a department contains sub-agencies you’ve never heard of, those sub-agencies fund organizations you have, and those organizations may run largely on the public’s money. Every decision above — grouping by sub-agency, normalizing names, trimming by proximity, gating the 990 match, normalizing the ratio to a fiscal year — exists to make that flow legible without lying about it. That’s the whole job.