The ProPublica Data Store

Get the datasets that power our journalism.

  • Download for free data we received as the result of a public records request
  • Find links to public data sets used in our investigations
  • Purchase data sets that have been cleaned, processed, and augmented by our staff

Premium Datasets (Purchase)

Cleaned up, categorized and often created from multiple sources, these Premium datasets are unique to ProPublica.

APIs and Raw Data (Free)

ProPublica's APIs and other free-to-download raw datasets that we've compiled from our own research or received via FOIA request.

External Data

ProPublica frequently uses datasets that are free and available online. So instead of downloading copies from us, we send you straight to the source.

Featured Datasets

Source JOURN ($) ACAD ($)

Premium: Dollars for Docs Data (National, 2013-2014)

Data on $3.5 billion in payments made by pharmaceutical companies to doctors, other medical providers and health care institutions between August 2013 and December 2014. The data is a cleaned and combined version of the Open Payments data from the Center for Medicare and Medicaid. ProPublica has cleaned and standardized drug, device, company and teaching hospital names. Data from 2009-2013 is available here.
Size: 14,837,291 rows, Date Released: 1/1/2016
Centers for Medicare & Medicaid Services $200 $2,000 Purchase

Premium: Open Payments / NPI Crosswalk

An add-on to the Dollars for Docs data set, which includes data on $3.5 billion in payments made by pharmaceutical companies to doctors, other medical providers and health care institutions between August 2013 and December 2014.

Dollars for Docs is a cleaned and combined version of the Open Payments data from the Center for Medicare and Medicaid. ProPublica has cleaned and standardized drug, device, company and teaching hospital names.

The Open Payments-NPI crosswalk enables users to match physicians in the Open Payments database to other provider data associated with their National Provier Identifier. The crosswalk is 99.7% complete.

Data from 2009-2013 is available here.

Size: 14,837,291 rows, Date Released: April 2016
ProPublica, Centers for Medicare & Medicaid Services $1,000 $5,000 Purchase

Premium: Prescriber Checkup Dataset 2013

ProPublica's Prescriber Checkup data for 2013. The data has been cleaned and joined with other tables to include providers' names, addresses, specialties and contact information, as well as additional information on doctors' prescribing habits. There are five total files.
Size: Varies, Date Released: June 2015
Centers for Medicare & Medicaid Services $200 $2,000 Purchase

Premium: Surgeon Scorecard Dataset

ProPublica's Surgeon Scorecard data.
Size: 23,370 rows, Date Released: September 2015
Centers for Medicare & Medicaid Services $200 $2,000 Purchase

Health Datasets

Source JOURN ($) ACAD ($)

Premium: Dollars For Docs Data (Per State)

This data includes more than $4 billion in payments to doctors, other medical providers and health care institutions that were disclosed by 17 pharmaceutical companies from 2009 to 2013. ProPublica combined, cleaned, and standardized data from multiple sources.
Size: Varies, Date Released: October 2014
Pharmaceutical Company Disclosures $200 $2,000 Purchase

Premium: Combined Dollars for Docs Dataset (National)

This data includes more than $4 billion in payments to doctors, other medical providers and health care institutions that were disclosed by 17 pharmaceutical companies from 2009 to 2013. ProPublica combined, cleaned, and standardized data from multiple sources.
Size: 3,362,932 rows, Date Released: October 2014
Pharmaceutical Company Disclosures $1,000 $10,000 Purchase

Premium: Prescriber Checkup Dataset 2012

ProPublica's Prescriber Checkup data for 2012. The data has been cleaned and joined with other tables to include providers' names, addresses, specialties and contact information, as well as additional information on doctors' prescribing habits. There are six total files.
Size: Varies, Date Released: January 2015
Centers for Medicare & Medicaid Services $200 $2,000 Purchase

Premium: Prescriber Checkup Dataset 2011

ProPublica's Prescriber Checkup data for 2011. The data has been cleaned and joined with other tables to include providers' names, addresses, specialties and contact information, as well as additional information on doctors' prescribing habits. There are six total files.
Size: Varies, Date Received: April 2013
Centers for Medicare & Medicaid Services $200 $2,000 Purchase

Medicare Part D Hepatitis C Prescribing Data 2014

Medicare Part D prescription data for Hepatitis C drugs in 2014. ProPublica used this data in "The Cost of a Cure: Medicare Spent $4.5 Billion on New Hepatitis C Drugs Last Year".
Size: 6,805 rows, Date Received: March 2015
Centers for Medicare & Medicaid Services Download

Medicare Part D Prescribing Data 2012

Medicare Part D prescriptions for 2012. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier. ProPublica used this data in Prescriber Checkup.
Size: 21,970,751 rows, Date Released: July 2014
Centers for Medicare & Medicaid Services Download

Medicare Part D Prescribing Data (Patients 65 or Older) 2012

Medicare Part D prescriptions written only for patients 65 or older in 2012. The data include all drugs prescribed by doctors 11 or more times to these patients in 2012. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier. ProPublica used this data in Prescriber Checkup.
Size: 16,966,011 rows, Date Received: July 2014
Centers for Medicare & Medicaid Services Download

Medicare Part D Prescribing Data 2011

Medicare Part D prescriptions for 2011. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier. ProPublica used this data in Prescriber Checkup.
Size: 21,150,242 rows, Date Received: June 2014
Centers for Medicare & Medicaid Services Download

Medicare Part D Prescribing Data (Patients 65 or Older) 2011

Medicare Part D prescriptions written only for patients 65 or older in 2011. The data include all drugs prescribed by doctors 11 or more times to these patients in 2011. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier. ProPublica used this data in Prescriber Checkup.
Size: 16,366,282 rows, Date Received: June 2014
Centers for Medicare & Medicaid Services Download

Medicare Part D Custom Data Runs 2011

This dataset includes additional Medicare files ProPublica used to create the Prescriber Checkup app. The data include drug costs, drug counts and narcotic and antipsychotic drug use.
Size: Varies, Date Received: June 2014
Centers for Medicare & Medicaid Services Download

Medicare Part D Prescribing Data 2010

Medicare Part D prescriptions for 2010. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. A lookup file is provided to match unique prescriber ID to a practitioner's DEA or NPI number or other identifier. ProPublica used this data in Prescriber Checkup.
Size: 20,758,453 rows, Date Received: June 2014
Centers for Medicare & Medicaid Services Download

ACA Plan Compare 2014-2015 Data

This data compares differences between 2014 and 2015 Affordable Care Act insurance plans. The data comes already joined through a crosswalk file and includes fields that indicate if a plan changed, and by how much. ProPublica used to create the "Will My Obamacare Health Care Costs Go Up?" app.
Size: 79,279 rows, Date Released: December 2014
Centers for Medicare & Medicaid Services Download

Medicare Part D Prescribing Data 2013

Medicare Part D prescriptions for 2013. The data include all drugs prescribed by doctors 11 or more times that year to Part D patients, including those 65 and older. ProPublica used this data in Prescriber Checkup.
Centers for Medicare & Medicaid Services Link

CMS Open Payments Data

CMS Open Payments data. ProPublica cleaned this data to use in the Open Payments Explorer app.
Centers for Medicare & Medicaid Services Link

CDC Mortality Data

CDC's mortality and cause-of-death data. ProPublica used this data in our Tylenol overdose story.
Centers for Disease Control and Prevention Link

Nursing Home Compare Data

The Centers for Medicare and Medicaid Services Nursing Home Inspection data, including general information about nursing homes, health deficiencies, and penalties, updated monthly. ProPublica used this data in our Nursing Home Inspect. The data sets in particular that we used are Health Deficiencies, Penalties, and Provider Info.
Centers for Medicare & Medicaid Services Link

Nursing Home Deficiencies Data

The Centers for Medicare and Medicaid Services also makes publicly available the full-text statements of nursing home deficiencies. Scroll down to "Related Links" and the dataset is called "Full Text of Statements of Deficiencies -- [Current Month & Year]. It contains 10 Excel files for each region of the country. ProPublica used this data in our Nursing Home Inspect.
Centers for Medicare & Medicaid Services Link

Medicare Part B Provider Utilization and Payment Data

Medicare Part B service data for 2012. The data include all services performed by doctors 11 or more times that year to Part B patients. ProPublica used the data to create the Treatment Tracker app and stories.
Centers for Medicare & Medicaid Services Link

Education Datasets

Source JOURN ($) ACAD ($)

School Desegregation Orders Data

This is a dataset of school desegregation orders. The data files include information about school desegregation orders mandated by federal courts and open school desegregation orders that resulted from voluntary agreements between school districts and the U.S. Department of Education’s Office of Civil Rights. ProPublica used this data in our desegregation app.
Size: Varies, Date Uploaded: December 2014
U.S. Department of Justice; Stanford University Download

Restraint and Seclusion Data

This data contains all instances of restraints and seclusions that public schools self-reported during the 2011-2012 school year. It is broken down by state, district, and school. This is the first time the federal government has attempted to collect this data from all schools, though beware: many school districts did not report. ProPublica used this data in our story on the use of restraints at school. Read our reporting recipe for tips on how you can report this story.
Size: 95,635 rows, Date Uploaded: June 2014
Office of Civil Rights, U.S. Department of Education Download

Business Datasets

Source JOURN ($) ACAD ($)

Premium: Workers’ Compensation Body Parts Data

The maximum permanent partial disability compensation that injured workers can receive for various body parts in fifty states, the District of Columbia and the federal system. ProPublica used this data in our graphic Workers Comp Benefits: How Much is a Limb Worth? For more information, read our methodology here.
Size: 55 rows, Date Released: March 2015
ProPublica research of state workers’ compensation laws $200 $2,000 Purchase

Premium: Recovery Tracker Data

This dataset combines records from the recipient-reported data on Recovery.gov and Recovery Act grants and loans reported by agencies on USAspending.gov. ProPublica used this data in our Recovery Tracker.
Size: 472,059 rows, Date Received: 2013
Recovery.gov, USAspending.gov $200 $2,000 Purchase

Premium: ProPublica's Bailout Data

This data includes expenditures by the Treasury Department via both the broader $700 billion TARP bill (later reduced to $475 billion) and the separate bailout of Fannie Mae and Freddie Mac. ProPublica used this data in our bailout coverage.
Size: Varies, Date Released: October 2014
Treasury Department; SEC filings $200 $2,000 Purchase

New Debt Collection Datasets

This file contains all of the data used to create the graphics in "So Sue Them: What We've Learned About the Debt Collection Lawsuit Machine," by Paul Kiel and Lena Groeger.
Size: Varies, Date Uploaded: May 2016
ProPublica analysis, state court data (various jurisdictions) Download

Rating Agency Document Review Data

This data contains a summary of comments investment bankers made about credit rating agencies while pitching their underwriting services for tobacco bonds from 1999 onward. The comments come from 140 underwriting pitches ProPublica collected under public records requests in more than a dozen states. ProPublica used this data in "Bankers Brought Rating Agencies ‘To Their Knees’ On Tobacco Bonds".
Size: 265 rows, Date Released: December 2014
Bond underwriters' responses Download

Workers’ Compensation State Reforms Data

Summaries of the major changes to state workers’ compensation laws since 2003. ProPublica used this data in our graphic Workers’ Compensation Reforms by State.
Size: 50 rows, Date Released: March 2015
ProPublica research on state reform laws Download

Workers’ Compensation Premium Rate Data

State rankings comparing workers’ compensation insurance rates paid by employers. ProPublica used this data in our graphic on workers’ comp premium rates being at a 25-year low.
Oregon Department of Consumer and Business Services Link

Transportation Datasets

Source JOURN ($) ACAD ($)

Federal Air Marshal Misconduct Database

Federal air marshals fly undercover on passenger planes and are trained to intervene in the event of a hijacking. This database contains information on cases of misconduct committed by federal air marshals by date and field office and what discipline was meted out in response. The data covers November 2002 to February 2012.
Size: 5,214 rows, Date Received: February 2016
Transportation Security Administration Download

Pipeline Safety Data

This data contains all reported oil or gas pipeline incidents documented by the Pipeline & Hazardous Materials Safety Administration. ProPublica used this data in our Pipeline Safety Tracker.
Pipeline & Hazardous Materials Safety Administration Link

Military Datasets

Source JOURN ($) ACAD ($)

Commander's Emergency Response Program Data

This data contains payments made by U.S. military commanders to the Afghan people during the Afghanistan war. ProPublica used this data in our app Money as a Weapons System.
Size: 17,958 rows, Date Received: January 2015
Special Inspector General for Afghanistan Reconstruction Download

New ProPublica's Afghan Waste Data

ProPublica reviewed 235 SIGAR financial audits, special projects, program audits and inspection reports to compile this data, which was used for the "We Blew $17 Billion in Afghanistan. How Would You Have Spent It?" story and news app/game.
Size: 132 rows, Date Created: December 2015
Special Inspector General for Afghanistan Reconstruction Download

Campaign Finance Datasets

Source JOURN ($) ACAD ($)

Free the Files Filing Data

ProPublica's Free the Files data. ProPublica curated this data from TV stations' political ad filings in swing markets in 2012.
Size: 66,225 rows, Date Released: January 2015
ProPublica, Federal Communications Commission Download

Campaign Finance API

Using the Campaign Finance API, you can retrieve data from United States Federal Election Commission filings and other sources. The API, which originated at The New York Times in 2008, covers summary information for candidates and committees, as well as certain types of itemized data.
Size: Varies, Date Released: January 2016
Federal Election Commission API

PAC Donor Similarity Scores

This data contains cosine similarity scores for PAC donors to congressional recipients, as described in the ProPublica story "Campaign Donations Reflect the Sharp Split in Congress Among Republicans."
Federal Election Commission Link

Criminal Justice Datasets

Source JOURN ($) ACAD ($)

New COMPAS Recidivism Risk Score Data and Analysis

The data and analysis code behind our story "Machine Bias." Includes a database containing the criminal history, jail and prison time, demographics and COMPAS risk scores for defendants from Broward County from 2013 and 2014, and other files needed for the analysis. Code is in R and Python, and included in a Jupyter notebook.
Link

The data store is part of our ongoing efforts to share our work with the public and to sustain our journalism. Read more about its history or contact us at [email protected] with any questions.