Demand-Driven Open Data: How to get the health data you need
Posted by dportnoy
on September 22, 2015 11:18am
In the article Identifying and Harnessing Demand to Drive Open Data (on the IDEA Lab blog), we examined the origins and purpose of Demand-Driven Open Data (DDOD). To recap… DDOD provides you with a systematic, ongoing and transparent way to tell HHS and its agencies what data you need.
Now we can turn our attention to how you can make DDOD work for you. This post outlines the DDOD workflow and how to engage with DDOD. DDOD is useful for anyone across the user community – industry, researchers, nonprofits, media or even other government organizations – who has a data need that’s not yet being met. Each request for data in DDOD becomes a “use case” that’s prioritized, worked on and tracked with full visibility to the user community. Currently, there are over 45 use cases in various states of progress, including eight closed use cases, with two of those spurring an agency to release a new API.
Getting to Know Who’s Involved
Each use case is taken through a similar workflow with predefined milestones. There are three
categories of participants in a use case: Data User, DDOD Admin, and Data Owner. You’re the Data User. The Data Owner could be anyone in an HHS agency who ultimately has responsibility for the data associated with your use case. The DDOD Admin is an intermediary between the User and Owner, who’s assigned to ensure the process of investigating, analyzing, and securing a resolution to your use case goes smoothly. For more information on the DDOD Admin team, see the DDOD wiki.
Entering Use Cases
First, go to HealthData.gov and run a search to see if the datasets you need are already catalogued. HealthData.gov indexes existing both public HHS datasets and DDOD use cases. If there are no datasets or use cases similar to your needs, add a new one. Currently, to add a new use case, go to the DDOD Github Issues page and add your use case by clicking “New Issue.” In the future, you will be able to make DDOD use cases requests directly through HealthData.gov.
In addition to a general description of your needs, a few crucial pieces of information are needed to structure your use case for DDOD admins to effectively research the use case and track down the data you need.
- Check to see if the data you are looking for is already available on healthdata.gov (in case you missed the mention earlier!).
- Describe exactly how you would use the data. That’s the part that makes this request focused enough to be a use case.
- Identify the value of obtaining this data, both in terms of importance to your organization and more broadly in terms of your industry or public health.
- Describe the data you’re currently using and its limitations. This is because many requests come from organizations who already have a partial or suboptimal solution.
- Finally, provide specifications for the requested data. These are technical requirements, such as which data fields are needed, what other datasets they need to be joined to, and the desired refresh rate, lag time, and access method.
This combination of information is the story telling currency that illustrates the value and importance of your request to the Data Owner.
Voting Up Existing Use Cases
Occasionally, the use case entry process is a bit different. When you run a search on HealthData.gov, it’s possible you’ll see other use cases that meet your needs in the results. If you see a use case that describes exactly what you need, speak up and support the use case. To do this, follow the entry to the associated GitHub issue and post a note indicating your interest in that same use case. If the use case describes similar data but your particular needs differ slightly, you should indicate this as well. Basically, feel free to elaborate on any of the pieces of information that go into a use case requirements section. The more interested parties there are in a use case, the better we are able to demonstrate the value of releasing the data, and the higher priority the use case becomes.
Communicating Your Use Case to the Data Owner
At this point, the DDOD Admin will transfer your requirements into the DDOD knowledge base, which is located at http://ddod.us. Most likely the Admin will ask you clarifying questions in an effort to make sure that your use case is sufficiently defined in order to be achievable and to locate the right Data Owner. In some cases, the Admin may request that your use case be merged into an existing similar or related use case. This might happen if your use case has the same Data Owners as another one and is sufficiently close functionally that merging would positively impact the execution effort. All parties benefit when cases are merged, since generally the more requesters associated with a given use case, the more attention and resources it’s likely to get from Owners.
From there, the DDOD Admin locates the Data Owner and begins the discussion about your use case. Updates on discussions are posted to the GitHub issue for the use case as a record for everyone to see. From time to time either the Admin or Owner will post clarifying questions for you to the same GitHub issue.
Addressing Your Use Case
Typically, solutions are broken into two components: short-term workaround and long-term solution. The workarounds are often insights into how existing data sources can be leveraged to enable a use case. Sometimes solutions may reference hard to find datasets that aren’t indexed to HealthData.gov. Step-by-step instructions are given on precisely how existing data can be manipulated and interpreted to solve the main challenges. Solutions may even suggest submitting FOIA requests and provide specific instructions, including the group to route to, the system of record to access, field names to specify, and query logic. The key to these workarounds is that they can be accomplished through documentation alone or minimal work by the DDOD Admin. They don’t require a significant commitment from the Data Owner. Although they may be suboptimal and require intensive and manual steps, they do get to an interim solution faster.
The long-term solution, on the other hand, typically involves a commitment from the Data Owner to do development. It’s up to the Data Owner to determine whether and how a long-term solution is implemented. Owners have to take many factors into account besides the understanding of the demand that DDOD provides. Among these factors are cost of development, risk of personal data re-identification, risk of releasing easily misinterpreted data, opportunity for workload reduction from fewer FOIA requests, appropriateness to the agency’s mission, and fit within their organization’s current priorities. Needless to say, the implementation time for such solutions could be significant, but varies widely. So far the fastest such implementation has been about six months.
Checking on progress
Want to keep tabs on how your use case is progressing? You could periodically check for movement in the progress matrix section on the DDOD wiki. But the easiest is to subscribe to receive automated notifications of changes in the solution write up and posts to the discussion. First, go to the DDOD wiki page describing your use case and click the star (✩) to watch that page. Then, go to the GitHub issue used for discussing your use case and click “Subscribe”. And if you’re getting impatient or think your use case isn’t getting sufficient attention, post a comment to the GitHub issue.
Glad you asked. Pretty simple. First, if you have use cases that might fall within the scope of HHS and its agencies, don’t hesitate to add them as described here. Second, spread the word. There are already dozens of use cases, many of which have newly documented datasets, short term workarounds, and long term solutions.
So let your colleagues know about DDOD and how it has already helped users like you get the data they need. For more info, you can also follow David on Twitter @dportnoy or via http://david.portnoy.us or email him a question at firstname.lastname@example.org