Primary tabs

How Journalists Used Public Data to Publish the Complication Rates of 16,000 Surgeons

How Journalists Used Public Data to Publish the Complication Rates of 16.000 Surgeons
By Olga Pierce & Marshall Allen, Propublica

We had three goals in mind in creating [ "Surgeon Scorecard" ], the first national online database to publicly name surgeons and measure how often their patient suffer serious complications:

1) Provide patients with actionable information about where to get safe care.
2) Give surgeons and hospital officials useful information so they can improve.
3) Test the power of transparency, to see how public reporting of surgical outcomes can work on a variety of levels to make care safer for patients.

See Pic1 for reference.

We knew the gold standard for tracking and validating performance in health care is clinical data – private information from patients’ medical charts, notes and internal hospital recordkeeping systems. But as health care journalists – not medical providers – we had to rely on publicly available data. We chose Medicare administrative data – the only national data with the features to meet our goals.

The choice came with opportunities and limitations. Where clinical records are deep, claims data are broad. For our analysis, we used the Inpatient Limited Data Set, which includes a record of every hospital visit by a Medicare patient in any given year. Each record includes the admitting hospital, basic patient information like age and gender, ICD-9 procedure and diagnosis codes, the dates of admission and discharge and more. Importantly for our project, attending and operating physicians are identified.

We obtained the data from 2009 through 2013 – altogether a cache of nearly 61 million records more than 200 variables wide. One shortcoming is that the data do not include other payers; the plus is that they represent the complete population of Medicare fee-for-service patients during those years.

We spoke to dozens of doctors, surgeons and researchers who use the Medicare data. They helped guide us as we developed our patient outcome measures. We wanted to be conservative, only using the most reliable data elements. We also wanted to set a reasonably high bar for defining patient harm. Patient issues and complications that were quickly and easily resolved – i.e., not severe enough to result in death or a hospital readmission – are not part of Scorecard’s analysis.

Finally, we wanted to provide patients with a simple, single piece of information and make its reliability transparent. We wanted to avoid measures like “infections per thousand ventilator days” because they are confusing for a lay audience and mask the human toll of medical mishaps.

We settled on two outcomes about which there could be little dispute: Deaths in the hospital and readmissions within 30 days due to complications related to the procedure. In doing this, we were following the lead of many other health care researchers. Looking at what happened to patients after they left the hospital made sense for the common elective procedures we included in our analysis (hip and knee replacements, gall bladder removals, prostate removal and resection, three types of spinal fusions). For these operations, patients are generally in good health and are hospitalized for only a few days. To be fair to surgeons, we screened out complex cases.

In executing our analysis, we employed software that is widely available and avoided proprietary tools, such as SAS. Data manipulation was done with SQL, with statistical modeling in R. Several flavors of SQL are freely available to the public, and R is an open-source language that is used by many thousands of researchers across a multitude of disciplines. In the end, we used more than 1,000 lines each of SQL and R code. (We plan to share as much of this code as possible in keeping with patient privacy.) Especially useful was the R package ‘lme4: Linear mixed-effects models using Eigen and S4’ v.1.1-7 by Douglas Bates et al. Bates is a respected statistician in this area, which gave us confidence in the results.

We used SQL Server to identify cases that had an ICD-9 procedure code indicating the patient underwent one of the surgeries in our analysis. We grouped by ICD-9 diagnosis code to identify the most common principal diagnoses that were associated with the procedure. For total knee replacements, for example, the most common codes were related to osteoarthritis. We further filtered these diagnosis codes with the help of surgeons to make sure we were including only the most bread-and-butter cases. We also excluded patients admitted through the emergency room or transfers from other facilities.

We did this, in part, because we knew that risk adjustment would be a challenge. Some surgeons do relatively low numbers of surgeries, and complications are a rare event, making modeling more difficult. We also knew that hospital coding for comorbidities like diabetes and obesity varies significantly. As such, we decided to lessen the importance of risk adjustment by standardizing the patient pool as much as possible. This is probably the most important thing we did account for differences in case-mix.

The Patient Status variable identified those who died during their initial hospital stay. We flagged these cases as complications, and then excluded those patients from our readmission analysis. We then used SQL to identify cases where a patient returned to the hospital within 30 days. However, examination of the data showed that measuring readmission for any reason was not specific enough – to identify a surgical complication, we needed to look at why a patient returned to the hospital.

We turned to experts for help. A team of more than two dozen doctors, including many surgeons, volunteered to review the entire list of primary diagnosis codes that appeared on any of the 30-day readmissions after one of the procedures included in our analysis. (The primary diagnosis code is the code Medicare primarily relies upon to determine how much a hospital should be paid for a case.) The experts identified which of these primary diagnosis codes could reasonably be considered a complication related to each type of surgery, like an infection, a blood clot or a mechanical problem with an orthopedic device. In the end, the list included more than 1,000 diagnoses. These are the only 30-day readmissions we counted as complications.

Last spring, as were preparing our data and developing our approach, a critical development took place: CMS agreed to our request to make unencrypted surgeon identifiers public for the first time. The decision grew out of a 2013 legal case in which Dow Jones & Co. (owners of the Wall Street Journal) won a lawsuit overturning an old court injunction keeping the information secret. In April 2014, CMS provided us with a crosswalk that allowed us to connect the surgeon identifiers in our data to names using the NPI registry.

That enabled us to count the number of complications and procedures for each named surgeon. We then set about risk-adjusting the “raw” complication rate. Using R we generated a mixed-effects model that took into account patient information (age, gender, and comorbidities as identified by the Elixhauser Comorbidity Index), information about the complexity of the procedure and hospital and surgeon random effects.

Our model generated what we called an Adjusted Complication Rate (ACR) for each surgeon as well as a 95 percent confidence interval. Because surgeons often performed operations at multiple hospitals, we made the decision to produce a single ACR for each surgeon. Producing a separate ACR for a surgeon at each hospital would have compounded the problem of scoring surgeons who operated less frequently.

Displaying our results posed another challenge. We wanted to provide as much useful information as we could for patients and the medical community, but our Data Use Agreement restricted the reporting of any number of patients from 1 to 10, to ensure that no patient could ever be identified. After negotiations, CMS approved a plan to let us report an ACR for such surgeons, as well as the number of operations they performed. For the number of complications, however, we were restricted to reporting “1-10” and in place of the unadjusted complication rate, the word “redacted” appears.

We decided the most transparent way to present results was to include both the ACR point estimate and a confidence interval in our visualization, as seen below. User testing of the visualization indicated that most would be able to correctly interpret the data this way.

See Pic2 for reference. Each Surgeon’s Adjusted Complication Rate is shown as a point estimate plus confidence interval. The green, yellow, and red zones represent the spectrum of surgeon complication rates nationwide. When the number of complications is between one and 10, some information is obscured.

We wanted to balance the goals of being fair to surgeons and helping patients choose. To be fair to surgeons, we reported their scores with hospital quality held constant. However, this presented the danger of patients being steered to a high-performing surgeon at a low-performing hospital. We addressed this concern in the view of Scorecard users see when they search for a surgeon “near me,” by entering a city, address, or ZIP code. Results from a “near me” search are ordered by the best hospital-surgeon combinations, as seen in the screenshot below, not just a surgeon’s score.

See Pic3 for reference.

To date, Surgeon Scorecard has been visited more than 1.8 million times – and set off an intense debate. Some critics have said the data is too limited to be reliable or questioned our methods. Others have called Scorecard an advance in transparency that will benefit patients and the medical community.

We’ve received much constructive advice that we plan to incorporate in Surgeon Scorecard 2.0, set for release later this year. We welcome anyone interested in helping to email us at

ProPublica is a nonprofit news organization that does journalism in the public interest. Olga Pierce is ProPublica’s deputy data editor and Marshall Allen reports on health care.