Typically, research studies are islands unto themselves. After design and funding, each study recruits and enrolls a cohort, collects samples, then generates private data to produce publications and intellectual property. The resulting data silos are economically inefficient and fail to enable the integration of data, participants and insights. Our proposed research portal is an alternative paradigm that centers on open data and collaboration from the outset.
Highly integrated, longitudinal health data are extremely valuable for science and the advancement of human health. Because detailed, individual-level data can be impossible to anonymize, traditional privacy assurances are problematic. These assurances create structures that reduce sharing, inhibit data aggregation and collaboration, and compromise discovery. Anonymization practices often isolate participants, preventing them from contributing their full value and learning and participating in dissemination of results.
The Open Humans Network is inspired by and will collaborate with the Harvard Personal Genome Project (PGP). Founded in 2005, the PGP has championed a model that returns data to participants and enables them to share. To promote data aggregation, integration, and discovery, the PGP has turned the privacy problem on its head: rather than jeopardize the trust relationship with weak promises about anonymity, this study specifically recruits people comfortable with public sharing and the potential for re-identifiability -- a practice called "open consent". To-date, approximately 3000 Harvard PGP participants have contributed extensive data and tissue specimens with the goal of advancing scientific knowledge and biodiscovery.
The end result of this project is to create the Open Humans Research Portal, connecting participants willing to publicly share data about themselves with researchers interested in using and adding to that public data. The portal will showcase this public data, facilitating exploration and providing tools for unrestricted download of publicly available machine-readable data sets. Researchers interested in working with participants to create additional data (e.g. performing "microbiome profiling" for "participants with genotype data") must agree to guidelines to become network members.
Initially, this project will specifically target researchers and research participants in Boston and New York. These two cities are among the most densely populated academic-industrial centers in the world for health and biomedical research. We will incrementally expand the network to other regions in the United States.
Who is working on the project? Who are your partners?
We will seed this effort through collaborations with several leading examples of equitable research:
- Harvard Personal Genome Project (George Church, Harvard Medical School)
- American Gut (Rob Knight, University of Colorado, Boulder / HHMI)
- Flu Near You Research Participants (Rumi Chunara, Boston Children’s Hospital / HMS)
- Chronic Fatigue Syndrome Study, (Eric Schadt, Mount Sinai School of Medicine)
Other contributors and advisors include:
- Computational biologist: Madeleine Ball (Harvard)
- User interface design: Involution Studios (Boston)
- Secure data protocols: Jeremie Miller (Jabber, Singly, Telehash)
- Open devices: Charles Fracchia (MIT Center for Bits & Atoms / HMS)
- Citizen-Scientist Advocate: Abigail Wark (Harvard)
- Outreach: Celia Fulton Walden (GET Conference & GET Labs)
How do you know there is demand for this project?
Participant Demand: While there has been much progress in health care with patient access to their own medical records, the same cannot be said about the research enterprise despite repeated evidence that people are strongly in favor of getting access to their data [ ref 1, ref 2]. Based on direct experience with growing the Harvard PGP and American Gut communities (~10,000 members combined), many people are already engaged in the open health ecosystem, but being early-on in its development, there are many unmet needs and lots of opportunity to make public engagement in health research fabulous.
Researcher Demand: Researchers benefit greatly from public data resources, but frankly, they are having trouble figuring out how to create them due to complex issues around potential re-identifiability, appropriate consent, and governance. Returning data to participants and allowing them to manage its public sharing cuts this gordian knot and creates a vital community resource of health data that can be leveraged in many ways, to say nothing of the incredible value that is created for researchers who gain access to cohort of well-characterized people who may serve as controls for many different studies.
How is your project different from what already exists?
Participatory research studies that share computable data with their enrolled volunteers are a new and important phenomenon, not only as an enabling force toward more meaningful collaboration in research, but they also create new opportunities for participant-managed data sharing. Most health research studies today fail to provide much, if any, agency to research participants on these matters, due in part to simple inertia (“we’ve always withheld data from volunteers”) and public data sharing is stymied further by governance models that only allow one-size-fits-all decision-making around privacy. As a result, access to valuable community health data resources are mediated by a small number of individuals who get to choose what to investigate and publish, and sharing, when it does happen, occurs through restrictive and cumbersome controlled access databases. Our project will make it possible for people to easily find equitable research studies, aggregate their data over time and increase the impact of their contribution by facilitating public access to the data.
How will the data or information you use or create be made open?
The whole endeavor will be open. We will seed the portal with 4 amazing research studies that already engage in the return of computable data to participants. We will enable individuals enrolled in one or more of these studies to choose to publicly share their data through the OpenHumans portal using a CC0 waiver or equivalent public domain license. The website will have an interface where that data can be explored and downloaded in standard formats.
What will you make or do in this project?
We will create an online portal that has the following three components:
- Participant profiles: Participant-facing system for managing public data profiles, based on our experiences with Harvard PGP (see example public profile).
- Public data explorer: A public interface for accessing, exploring, and downloading computable data, initiated with data from the four seed projects.
- Design Guidelines: We want to raise the visibility and success of IRB approved research studies that share data with participants, so we will feature these four seed projects and use them as case studies for developing design guidelines and resources for enabling more researchers to follow suit.
How can others learn from/build on what you do?
Researchers will be able to learn from our design guidelines about how to facilitate sharing of data in their own studies. Study managers and review boards will have concrete examples to support their decision making when reviewing studies that return data to participants. Individual-level, open health data that we help organize is likely to accelerate the development and evaluation of tools that manipulate human data because current restrictions drastically reduce the number of software engineers, data scientists and others, that are able to access high-quality human data. We expect the OpenHumans community to become a vital global resource that can be utilized in many ways: open data can be deployed in efforts to improve biological literacy, a cohort of well-characterized people can serve as controls in numerous studies, and health discovery will be more effective when research data remains connected to the individuals it is about so it may be aggregated and re-used in many contexts.
How much do you think it will cost?
For this project to succeed we believe we need a minimum of approximately $650k in funding for the first 18 months. This will allow our consortium to pursue light-weight integration across these four seed projects, establish a critical mass of OpenHumans participants that bring valuable public data, and feature these resources in a functional portal with solid web design. We also need to devote funds toward establishing relationships with researchers who want to join the network in future phases, write guidelines for how that will work and have sufficient staffing to support high quality interactions with community members.
How would you use News Challenge funds?
We'll spend the majority (70%) on funding people: a software engineer, an informatics expert, as well as contracts and part time work for user interface design, participant community management, and some support for teams members in our consortium. Another 20% will be spent on computational infrastructure needed for a data intensive site (web server, colocation and bandwidth, and systems administration), and 10% will go towards travel and coordinating a conference in Boston that brings together OpenHumans researchers and participants.