Provenance-tracked provider data vs raw public files
Raw federal files — NPPES, PECOS, CMS Care Compare, OIG LEIE — are free and authoritative, but they ship as undated bulk CSVs with no cross-file identity spine and no per-field lineage. Fonteum publishes the same federal records NPI-resolved, snapshot-dated, and attestation-wrapped across as of June 2026.
Published June 17, 2026 · Last reviewed June 2026 · Capability comparison — public facts only
From a bulk download to a queryable, provenance-tracked record
| Capability | Raw public files (DIY) | Fonteum |
|---|---|---|
| Cost & licensing | Free under 17 U.S.C. § 105 — the federal records are public-domain works, no license required. | Free to read — the same public-domain federal records, published structured, with no paywall. |
| Identity resolution | Each bulk file keys on its own identifier; there is no shared NPI spine joining a provider across files. | NPI-resolved across sources on one identity backbone — active providers. |
| Cross-source joins | You join NPPES, PECOS, OIG LEIE, and Care Compare yourself, reconciling mismatched keys. | Pre-joined on NPI and CMS Certification Number so one entity view is reproducible against the files. |
| Field-level provenance | A bulk CSV carries no per-field source lineage — a number, but not where it came from. | Source name, snapshot date, and known limitation attached to every rendered field. |
| Freshness tracking | The file carries one release date; there is no per-field last-checked stamp. | Each field carries the source cadence and the date it was last reconciled. |
| Point-in-time history | Each download overwrites the last; reconstructing an as-of view is left to you. | Snapshot-dated, bitemporal history — query the record as it stood on a past date. |
| Tamper-evidence | A local copy carries no cryptographic proof of what was fetched or when. | SHA-256 digest plus an Ed25519 attestation chain on every snapshot. |
| Delivery & API | Bulk files only — you build and host the query layer yourself. | FHIR R4 API, MCP server, bulk export, and free research CSV/JSON. |
| Effort to first result | Download, parse, dedupe, entity-match, then host before the first query. | A first query in minutes with a free sandbox key — no sales cycle. |
Raw federal files are authoritative and free; the difference is the work between a bulk download and a queryable, provenance-tracked record. Capability descriptions reflect the public bulk-file format, not any single vendor.
What a bulk download leaves to you
The files are free — the integration is the cost
NPPES, PECOS, and OIG LEIE are authoritative public-domain downloads, and that is a real strength. The cost lands afterward: parsing multi-gigabyte bulk files, deduplicating, reconciling identifiers, and standing up a query layer before a single answer comes out.
Identity resolution is the hard part
A provider appears in NPPES, in PECOS enrollments, and in the OIG LEIE exclusions under keys that do not line up. Resolving one entity across files is the work a raw download leaves undone.
Provenance is what a download cannot carry
A CSV cell is a value with no memory of its origin. When an auditor asks which federal file backs a field and on what date, a field-level provenance record answers; a re-keyed local copy of a bulk file cannot.
Compare other data capabilities
Exclusion screening vs single-list checks →
Multi-source, NPI-resolved exclusion screening vs checking one list.
Live provider data vs annual snapshots →
Continuously refreshed federal data vs paywalled annual snapshots.
Healthcare provider data platforms compared →
How sourcing model and provenance separate the category.
Common questions
- Why not just download the raw NPPES and PECOS files myself?
- You can — they are free public-domain works. The work is everything after the download: parsing multi-gigabyte bulk files, deduplicating, reconciling identifiers across files, and hosting a query layer. Fonteum publishes those same federal records already parsed, NPI-resolved, and queryable, so the integration cost moves off your team.
- What does NPI-resolved add over the raw files?
- Raw files key on their own identifiers, so the same provider in NPPES, PECOS, and OIG LEIE is not linked. NPI-resolution joins those records onto one identity backbone of active providers, so a single provider view is reproducible across sources rather than something you reconstruct yourself.
- Is Fonteum's data different from the federal source files?
- No — it is the same federal records, restructured. Fonteum ingests the public files directly from CMS, OIG, and HRSA, parses them, and attaches provenance. Every published figure is reproducible against the original file, so you are reading the federal record, not a proprietary substitute for it.
- What is field-level provenance and why does it matter?
- Field-level provenance attaches a source name, snapshot date, and any known limitation to each individual field. It matters in compliance, credentialing, and diligence, where the basis of a data point can carry legal weight. A bulk CSV stores the value but not its lineage; the provenance record stores both.
- How does Fonteum stay current versus a one-time download?
- A download is a single point in time. Fonteum re-ingests each source on its native cadence — NPPES weekly, OIG LEIE monthly, Care Compare and PBJ quarterly — and stamps every field with a last-checked date, across , so freshness is observable rather than assumed.
- Is Fonteum free if the federal files are already free?
- The data is free to read — the underlying records are public-domain federal works, published openly with no account for static CSV and JSON. The paid pilot tier covers only what costs money to provide: scoped exports, FHIR API throughput, and integration support. You pay for scoping and throughput, not for access to federal data.
Skip the parsing. Query the federal record.
Browse free research at /research, see the field-level pipeline at /data-provenance, or request access.
- /sources → The full source library with tier, refresh cadence, and limitations.
- /data-provenance → How every field ties back to a federal record.
- /data → Dataset catalog, export concepts, and pilot pricing.
- /docs/fhir → FHIR R4 US Core endpoint reference.