The ProPublica Data Store
Get the datasets that power our journalism.
- Download for free data we received as the result of a public records request
- Find links to public data sets used in our investigations
- Purchase data sets that have been cleaned, processed, and augmented by our staff
Premium Datasets (Purchase)
Cleaned up, categorized and often created from multiple sources, these Premium datasets are unique to ProPublica.
APIs and Raw Data (Free)
ProPublica's APIs and other free-to-download raw datasets that we've compiled from our own research or received via FOIA request.
External Data
ProPublica frequently uses datasets that are free and available online. So instead of downloading copies from us, we send you straight to the source.
Featured Datasets |
Source | JOURN ($) | ACAD ($) | |
|---|---|---|---|---|
Premium: Dollars for Docs Data (National, 2013-2014)Data on $3.5 billion in payments made by pharmaceutical companies to doctors, other medical providers and health care institutions between August 2013 and December 2014. The data is a cleaned and combined version of the Open Payments data from the Center for Medicare and Medicaid. ProPublica has cleaned and standardized drug, device, company and teaching hospital names. Data from 2009-2013 is available here.
Size: 14,837,291 rows, Date Released: 1/1/2016
|
Centers for Medicare & Medicaid Services | $200 | $2,000 |
Purchase
|
Premium: Open Payments / NPI CrosswalkAn add-on to the Dollars for Docs data set, which includes data on $3.5 billion in payments made by pharmaceutical companies to doctors, other medical providers and health care institutions between August 2013 and December 2014.
Dollars for Docs is a cleaned and combined version of the Open Payments data from the Center for Medicare and Medicaid. ProPublica has cleaned and standardized drug, device, company and teaching hospital names. The Open Payments-NPI crosswalk enables users to match physicians in the Open Payments database to other provider data associated with their National Provier Identifier. The crosswalk is 99.7% complete. Data from 2009-2013 is available here. Size: 14,837,291 rows, Date Released: April 2016
|
ProPublica, Centers for Medicare & Medicaid Services | $1,000 | $5,000 |
Purchase
|
Premium: Prescriber Checkup Dataset 2013ProPublica's Prescriber Checkup data for 2013. The data has been cleaned and joined with other tables to include providers' names, addresses, specialties and contact information, as well as additional information on doctors' prescribing habits. There are five total files.
Size: Varies, Date Released: June 2015
|
Centers for Medicare & Medicaid Services | $200 | $2,000 |
Purchase
|
Premium: Surgeon Scorecard DatasetProPublica's Surgeon Scorecard data.
Size: 23,370 rows, Date Released: September 2015
|
Centers for Medicare & Medicaid Services | $200 | $2,000 |
Purchase
|
Health Datasets |
Source | JOURN ($) | ACAD ($) | |
|---|---|---|---|---|
Premium: Dollars For Docs Data (Per State)This data includes more than $4 billion in payments to doctors, other medical providers and health care institutions that were disclosed by 17 pharmaceutical companies from 2009 to 2013. ProPublica combined, cleaned, and standardized data from multiple sources.
Size: Varies, Date Released: October 2014
|
Pharmaceutical Company Disclosures | $200 | $2,000 |
Purchase
|
Premium: Combined Dollars for Docs Dataset (National)This data includes more than $4 billion in payments to doctors, other medical providers and health care institutions that were disclosed by 17 pharmaceutical companies from 2009 to 2013. ProPublica combined, cleaned, and standardized data from multiple sources.
Size: 3,362,932 rows, Date Released: October 2014
|
Pharmaceutical Company Disclosures | $1,000 | $10,000 |
Purchase
|
Premium: Prescriber Checkup Dataset 2012ProPublica's Prescriber Checkup data for 2012. The data has been cleaned and joined with other tables to include providers' names, addresses, specialties and contact information, as well as additional information on doctors' prescribing habits. There are six total files.
Size: Varies, Date Released: January 2015
|
Centers for Medicare & Medicaid Services | $200 | $2,000 |
Purchase
|
Premium: Prescriber Checkup Dataset 2011ProPublica's Prescriber Checkup data for 2011. The data has been cleaned and joined with other tables to include providers' names, addresses, specialties and contact information, as well as additional information on doctors' prescribing habits. There are six total files.
Size: Varies, Date Received: April 2013
|
Centers for Medicare & Medicaid Services | $200 | $2,000 |
Purchase
|
Medicare Part D Hepatitis C Prescribing Data 2014Medicare Part D prescription data for Hepatitis C drugs in 2014. ProPublica used this data in "The Cost of a Cure: Medicare Spent $4.5 Billion on New Hepatitis C Drugs Last Year".
Size: 6,805 rows, Date Received: March 2015
|
Centers for Medicare & Medicaid Services | — | — |
Download
|
Medicare Part D Prescribing Data 2012Medicare Part D prescriptions for 2012. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier. ProPublica used this data in Prescriber Checkup.
Size: 21,970,751 rows, Date Released: July 2014
|
Centers for Medicare & Medicaid Services | — | — |
Download
|
Medicare Part D Prescribing Data (Patients 65 or Older) 2012Medicare Part D prescriptions written only for patients 65 or older in 2012. The data include all drugs prescribed by doctors 11 or more times to these patients in 2012. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier. ProPublica used this data in Prescriber Checkup.
Size: 16,966,011 rows, Date Received: July 2014
|
Centers for Medicare & Medicaid Services | — | — |
Download
|
Medicare Part D Prescribing Data 2011Medicare Part D prescriptions for 2011. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier. ProPublica used this data in Prescriber Checkup.
Size: 21,150,242 rows, Date Received: June 2014
|
Centers for Medicare & Medicaid Services | — | — |
Download
|
Medicare Part D Prescribing Data (Patients 65 or Older) 2011Medicare Part D prescriptions written only for patients 65 or older in 2011. The data include all drugs prescribed by doctors 11 or more times to these patients in 2011. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier. ProPublica used this data in Prescriber Checkup.
Size: 16,366,282 rows, Date Received: June 2014
|
Centers for Medicare & Medicaid Services | — | — |
Download
|
Medicare Part D Custom Data Runs 2011This dataset includes additional Medicare files ProPublica used to create the Prescriber Checkup app. The data include drug costs, drug counts and narcotic and antipsychotic drug use.
Size: Varies, Date Received: June 2014
|
Centers for Medicare & Medicaid Services | — | — |
Download
|
Medicare Part D Prescribing Data 2010Medicare Part D prescriptions for 2010. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier. ProPublica used this data in Prescriber Checkup.
Size: 20,758,453 rows, Date Received: June 2014
|
Centers for Medicare & Medicaid Services | — | — |
Download
|
ACA Plan Compare 2014-2015 DataThis data compares differences between 2014 and 2015 Affordable Care Act insurance plans. The data comes already joined through a crosswalk file and includes fields that indicate if a plan changed, and by how much. ProPublica used to create the "Will My Obamacare Health Care Costs Go Up?" app.
Size: 79,279 rows, Date Released: December 2014
|
Centers for Medicare & Medicaid Services | — | — |
Download
|
Medicare Part D Prescribing Data 2013Medicare Part D prescriptions for 2013. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. ProPublica used this data in Prescriber Checkup.
|
Centers for Medicare & Medicaid Services | — | — |
Link
|
CMS Open Payments DataCMS Open Payments data. ProPublica cleaned this data to use in the Open Payments Explorer app.
|
Centers for Medicare & Medicaid Services | — | — |
Link
|
CDC Mortality DataCDC's mortality and cause-of-death data. ProPublica used this data in our Tylenol overdose story.
|
Centers for Disease Control and Prevention | — | — |
Link
|
Nursing Home Compare DataThe Centers for Medicare and Medicaid Services Nursing Home Inspection data, including general information about nursing homes, health deficiencies, and penalties, updated monthly. ProPublica used this data in our Nursing Home Inspect. The data sets in particular that we used are Health Deficiencies, Penalties, and Provider Info.
|
Centers for Medicare & Medicaid Services | — | — |
Link
|
Nursing Home Deficiencies DataThe Centers for Medicare and Medicaid Services also makes publicly available the full-text statements of nursing home deficiencies. Scroll down to "Related Links" and the dataset is called "Full Text of Statements of Deficiencies -- [Current Month & Year]. It contains 10 Excel files for each region of the country. ProPublica used this data in our Nursing Home Inspect.
|
Centers for Medicare & Medicaid Services | — | — |
Link
|
Medicare Part B Provider Utilization and Payment DataMedicare Part B service data for 2012. The data include all services performed by doctors 11 or more times that year to Part B patients. ProPublica used the data to create the Treatment Tracker app and stories.
|
Centers for Medicare & Medicaid Services | — | — |
Link
|
Education Datasets |
Source | JOURN ($) | ACAD ($) | |
|---|---|---|---|---|
School Desegregation Orders DataThis is a dataset of school desegregation orders. The data files include information about school desegregation orders mandated by federal courts and open school desegregation orders that resulted from voluntary agreements between school districts and the U.S. Department of Education’s Office of Civil Rights. ProPublica used this data in our desegregation app.
Size: Varies, Date Uploaded: December 2014
|
U.S. Department of Justice; Stanford University | — | — |
Download
|
Restraint and Seclusion DataThis data contains all instances of restraints and seclusions that public schools self-reported during the 2011-2012 school year. It is broken down by state, district, and school. This is the first time the federal government has attempted to collect this data from all schools, though beware: many school districts did not report. ProPublica used this data in our story on the use of restraints at school. Read our reporting recipe for tips on how you can report this story.
Size: 95,635 rows, Date Uploaded: June 2014
|
Office of Civil Rights, U.S. Department of Education | — | — |
Download
|
Business Datasets |
Source | JOURN ($) | ACAD ($) | |
|---|---|---|---|---|
Premium: Workers’ Compensation Body Parts DataThe maximum permanent partial disability compensation that injured workers can receive for various body parts in fifty states, the District of Columbia and the federal system. ProPublica used this data in our graphic Workers Comp Benefits: How Much is a Limb Worth? For more information, read our methodology here.
Size: 55 rows, Date Released: March 2015
|
ProPublica research of state workers’ compensation laws | $200 | $2,000 |
Purchase
|
Premium: Recovery Tracker DataThis dataset combines records from the recipient-reported data on Recovery.gov and Recovery Act grants and loans reported by agencies on USAspending.gov. ProPublica used this data in our Recovery Tracker.
Size: 472,059 rows, Date Received: 2013
|
Recovery.gov, USAspending.gov | $200 | $2,000 |
Purchase
|
Premium: ProPublica's Bailout DataThis data includes expenditures by the Treasury Department via both the broader $700 billion TARP bill (later reduced to $475 billion) and the separate bailout of Fannie Mae and Freddie Mac. ProPublica used this data in our bailout coverage.
Size: Varies, Date Released: October 2014
|
Treasury Department; SEC filings | $200 | $2,000 |
Purchase
|
New Debt Collection DatasetsThis file contains all of the data used to create the graphics in "So Sue Them: What We've Learned About the Debt Collection Lawsuit Machine," by Paul Kiel and Lena Groeger.
Size: Varies, Date Uploaded: May 2016
|
ProPublica analysis, state court data (various jurisdictions) | — | — |
Download
|
Rating Agency Document Review DataThis data contains a summary of comments investment bankers made about credit rating agencies while pitching their underwriting services for tobacco bonds from 1999 onward. The comments come from 140 underwriting pitches ProPublica collected under public records requests in more than a dozen states. ProPublica used this data in "Bankers Brought Rating Agencies ‘To Their Knees’ On Tobacco Bonds".
Size: 265 rows, Date Released: December 2014
|
Bond underwriters' responses | — | — |
Download
|
Workers’ Compensation State Reforms DataSummaries of the major changes to state workers’ compensation laws since 2003. ProPublica used this data in our graphic Workers’ Compensation Reforms by State.
Size: 50 rows, Date Released: March 2015
|
ProPublica research on state reform laws | — | — |
Download
|
Workers’ Compensation Premium Rate DataState rankings comparing workers’ compensation insurance rates paid by employers. ProPublica used this data in our graphic on workers’ comp premium rates being at a 25-year low.
|
Oregon Department of Consumer and Business Services | — | — |
Link
|
Transportation Datasets |
Source | JOURN ($) | ACAD ($) | |
|---|---|---|---|---|
Federal Air Marshal Misconduct DatabaseFederal air marshals fly undercover on passenger planes and are trained to intervene in the event of a hijacking. This database contains information on cases of misconduct committed by federal air marshals by date and field office and what discipline was meted out in response. The data covers November 2002 to February 2012.
Size: 5,214 rows, Date Received: February 2016
|
Transportation Security Administration | — | — |
Download
|
Pipeline Safety DataThis data contains all reported oil or gas pipeline incidents documented by the Pipeline & Hazardous Materials Safety Administration. ProPublica used this data in our Pipeline Safety Tracker.
|
Pipeline & Hazardous Materials Safety Administration | — | — |
Link
|
Military Datasets |
Source | JOURN ($) | ACAD ($) | |
|---|---|---|---|---|
Commander's Emergency Response Program DataThis data contains payments made by U.S. military commanders to the Afghan people during the Afghanistan war. ProPublica used this data in our app Money as a Weapons System.
Size: 17,958 rows, Date Received: January 2015
|
Special Inspector General for Afghanistan Reconstruction | — | — |
Download
|
New ProPublica's Afghan Waste DataProPublica reviewed 235 SIGAR financial audits, special projects, program audits and inspection reports to compile this data, which was used for the "We Blew $17 Billion in Afghanistan. How Would You Have Spent It?" story and news app/game.
Size: 132 rows, Date Created: December 2015
|
Special Inspector General for Afghanistan Reconstruction | — | — |
Download
|
Campaign Finance Datasets |
Source | JOURN ($) | ACAD ($) | |
|---|---|---|---|---|
Free the Files Filing DataProPublica's Free the Files data. ProPublica curated this data from TV stations' political ad filings in swing markets in 2012.
Size: 66,225 rows, Date Released: January 2015
|
ProPublica, Federal Communications Commission | — | — |
Download
|
Campaign Finance APIUsing the Campaign Finance API, you can retrieve data from United States Federal Election Commission filings and other sources. The API, which originated at The New York Times in 2008, covers summary information for candidates and committees, as well as certain types of itemized data.
Size: Varies, Date Released: January 2016
|
Federal Election Commission | — | — |
API
|
PAC Donor Similarity ScoresThis data contains cosine similarity scores for PAC donors to congressional recipients, as described in the ProPublica story "Campaign Donations Reflect the Sharp Split in Congress Among Republicans."
|
Federal Election Commission | — | — |
Link
|
Criminal Justice Datasets |
Source | JOURN ($) | ACAD ($) | |
|---|---|---|---|---|
New COMPAS Recidivism Risk Score Data and AnalysisThe data and analysis code behind our story "Machine Bias." Includes a database containing the criminal history, jail and prison time, demographics and COMPAS risk scores for defendants from Broward County from 2013 and 2014, and other files needed for the analysis. Code is in R and Python, and included in a Jupyter notebook.
|
— | — |
Link
|
The data store is part of our ongoing efforts to share our work with the public and to sustain our journalism. Read more about its history or contact us at [email protected] with any questions.
Purchase
Download
Link
API