NIH Makes Data Sharing Repositories Publically Viewable on HealthData.gov
By Elizabeth Kittrie and Shubham Chattopadhyay
In an effort to increase access to biomedical data for discovery and reuse, the National Institutes of Health (NIH) recently made 64 data sharing repositories viewable to the public on HealthData.gov.
These NIH-supported repositories represent a wide variety of publicly accessible repositories covering data from zebrafish model organisms to clinical eye exam data. Although these selected entries do not fully cover the vast range of NIH-supported repositories and data holdings, they represent a subset of data repositories that are generally characterized by the following attributes:
- Open data submission - The repository accepts data from a broad set of investigators. Ideally, any investigator with relevant data can submit data, but narrower sets of investigators are permissible, for example, any investigator funded by a particular NIH Institute or Center.
- Open data access - The repository makes data accessible for cost-free reuse by other investigators. This criterion can be met even if an investigator must request access to the data and have the access request approved by a data access committee, such as with human subject data.
- Open time frame for data deposit - The repository accepts data at any point in time after it was established.
- Sustained support – The NIH, or another sponsoring organization, is more likely to support the data repository if its usage supports continued programmatic goals and funds are available.
An example of a data sharing repository that is now publically visible through this effort is the 1000 Functional Connectomes Project/International Neuroimaging Data-Sharing Initiative (INDI) supported by the National Institutes of Mental Health (NIMH). Imaging data of all types, and in some cases its related phenotype data, can be deposited by any researcher into this repository. The project is supported by NIMH, but is open to all researchers. More than 5,000 resting state functional magnetic resonance imaging (fMRI) data sets are available as well as a growing number of diffusion magnetic resonance imaging (MRI) data.
A number of efforts are underway at the U.S. Department of Health and Human Services (HHS), and across the federal government, to increase access to the results of federally funded research. As a result of a recent memorandum from the White House Office of Science and Technology Policy, Increasing Access to the Results of Federally Funded Scientific Research, it is anticipated that in the coming years all federally funded researchers will be expected to make their scientific data available in publically accessible data sharing repositories. This is part of a larger shift to promote open data and open science across all of the U.S. Government departments and agencies.
An effort currently funded by the NIH Big Data to Knowledge (BD2K) program is called dataMED. This project, which is part of bioCADDIE (biomedical and healthCAre Data Discovery Index Ecosystem), is focused on developing a prototype data discovery index that will enable finding, accessing, and citing biomedical big data. It complements the utility of HealthData.gov in that it allows for an even greater degree of search capability and identification of individual biomedical datasets. A shared aim between NIH and HHS is to harmonize these systems through interoperable metadata.
It is our hope that by exposing these publically-accessible NIH-supported data sharing repositories on HealthData.gov we will not only bring further attention to their existence, but also increase opportunities for the discoverability and reuse of biomedical data supported through NIH funding. If you know about any significant NIH-supported data sharing repositories that are not yet included on HealthData.gov, we encourage you to contact the NIH Office of the Associate Director for Data Science