martes, 4 de abril de 2017

Open-Access Data-Sharing Model

Sounding Board
Advantages of a Truly Open-Access Data-Sharing Model


Monica M. Bertagnolli, M.D., Oliver Sartor, M.D., Bruce A. Chabner, M.D., Mace L. Rothenberg, M.D., Sean Khozin, M.D., M.P.H., Charles Hugh-Jones, M.D., David M. Reese, M.D., and Martin J. Murphy, D.Med.Sc., Ph.D.

N Engl J Med 2017; 376:1178-1181March 23, 2017DOI: 10.1056/NEJMsb1702054


Multi-institutional randomized clinical trials have been a feature of oncology research in the United States since the 1950s. Since that time, cancer-treatment trials have been continuously funded by the National Cancer Institute (NCI) through a program that has evolved to become the National Clinical Trials Network (NCTN). Currently, approximately 19,000 patients with cancer participate in NCTN clinical trials each year. Approximately 70,000 additional patients with cancer are enrolled each year in treatment trials sponsored by the pharmaceutical industry.1,2

It is important to honor and reward the altruism of patients who participate in clinical trials. One way to do so is to share the data gathered in clinical trials with other researchers in a responsible and meaningful way. The cancer research community, encouraged by recommendations from the Beau Biden Cancer Moonshot, is finally moving data sharing forward from its traditional, largely unfunded, place at the end of the long list of clinical research responsibilities to center stage.

There are a number of reasons why it has it taken more than 60 years for this issue to receive the attention that it deserves. Although the incentives for doing so may differ, competitive forces lead both academic researchers and pharmaceutical companies to protect data and to use data exclusively for their purposes. This approach protects their intellectual property and also shields the primary study team and the sponsor if the release of data from a trial for analysis by others leads to conclusions or interpretations that the primary researchers deem to be misleading or erroneous. When the academic and monetary stakes are high, the chance of this situation occurring is real. Another reason for the delay is that the protection of research participants dictates that confidentiality is the highest priority, and this risk may be greater with wide sharing of the new data-dense individual data sets that are required in order to develop personalized medicine approaches. Finally, and probably most important of all, data sharing has been hampered by a lack of resources, including access to enabling data systems technology, bioinformatics expertise, and legal agreements that facilitate sharing.

The idea of data sharing is moving beyond these hurdles with a variety of models. One such model, the so-called gatekeeper model,3 uses a distinct entity to house information in a central repository, with access to specific data sets that are provided to qualified research teams on the basis of a research proposal review by an independent expert committee. Examples of this approach include ClinicalStudyDataRequest.com, a website sponsored by pharmaceutical partners, and the Vivli platform (http://vivli.org), a nonprofit corporation created to support global sharing of clinical research data. Gatekeeper models provide substantial customization and oversight for individual data requests so that contributing investigators can maintain a level of control over how their data are used. This model may appropriately address barriers to sharing for studies in which the identification of participants is a risk, such as those that involve sensitive topics, genomic data, or limited numbers of participants. This model can also offer some protection to research teams that require limitations on the use of proprietary data. A limitation of gatekeeper models is that many barriers to data use remain.

A substantial body of readily available data from clinical trials can be shared with minimal risk to patients or researchers. Examples include data sets of already published trials, particularly if the treatments that were tested are not undergoing review for regulatory approval. In addition, industry-sponsored clinical trials generally involve a comparator group for which valuable patient-level data can be shared without risk to proprietary interests. As long as the data are appropriately anonymized to protect confidentiality and there are no restrictions related to the institutional review board, the consent form, or the sponsor with regard to the patient-level data, it should be possible for the data to be freely available to the public to download, analyze, and reuse for research purposes. This approach may enable the identification of baseline characteristics of the patients or outcomes that could be identified only by means of an analysis of larger numbers of patients than would be included in an individual trial. What has been lacking, until recently, has been the infrastructure required to achieve this goal.

An example of an active open-source data-sharing model, with which some of us are affiliated, is Project Data Sphere (PDS). PDS is a free digital library-data laboratory that was launched in 2014 as an independent, nonprofit initiative of the CEO Roundtable on Cancer (www.ceoroundtableoncancer.org), which was founded in 2001 to develop and implement initiatives that reduce the risk of cancer, enable early diagnosis, facilitate access to the best available treatments, and hasten the discovery of new and more effective anticancer therapies. A Web-based platform for accessing open-source data was developed for PDS by SAS Institute. Using this website, researchers can download, share, integrate, and analyze patient-level data. Data contributors are provided access to deidentification and data-standardization protocols, as well as to templates of legal agreements, including standardized data-sharing and online-services user agreements.4-6 Users of the site have access to analytic tools developed by SAS Institute. Anyone interested in cancer research can use the website, provided that they register and agree to a responsible-use attestation. PDS is funded by the engagement of a wide range of stakeholders that together ameliorate the burden of securing adequate funding from a single organization or institution.

At present, PDS contains data from more than 41,000 research participants from 72 oncology trials, covering multiple tumor types. The data have been donated by academic, government, and industry sponsors. These numbers are increasing quickly as use of the PDS accelerates. More than 1400 unique researchers have accessed the PDS database more than 6500 times. As one interesting example, a challenge was issued in 2014 that asked respondents to use PDS to create a better prognostic model for advanced prostate cancer. A total of 549 registrants from 58 teams and 21 countries responded. Accessible data included control groups from prospective, randomized, industry-sponsored trials. Solvers had backgrounds in statistics, data modeling, data science, machine learning, bioinformatics, engineering, and other specialties. Unexpectedly, the winning entrant, a team of researchers from Finland, had never worked on prostate cancer in the past, and this team considerably outperformed the best existing model for predicting overall survival among men with advanced prostate cancer.7 Thus, the PDS Prostate Cancer DREAM Challenge confirmed that an open-access model empowers global communities of scientists from diverse backgrounds and promotes crowd-sourced solutions to important clinical problems. This level of engagement is not possible with gatekeeper models.

PDS is provided to users free of charge, but the usefulness of the PDS website is limited to the trials that it contains and the data analytics provided by the platform. Expansion of this concept to the broader research community outside the field of oncology will be time consuming and costly, and it is open to debate whether publicly funded or private concerns are the most appropriate environment to assume responsibility for data storage and sharing. The DataNet program of the National Science Foundation is one example of a public–private partnership that has been designed to achieve these goals.8

The data-sharing community is undergoing rapid development, with several potential models and approaches (Table 1Table 1Oncology Clinical and Translational Research Data Archives.). We encourage multiple models to coexist, either as a single platform with tiered access or as discrete platforms with the potential for cross-communication that includes truly open platforms. We think that as the community sees the benefits of sharing trial data, more will be shared.

Disclosure forms provided by the authors are available with the full text of this article at NEJM.org.
Source Information

From the Dana–Farber Cancer Institute, Brigham and Women’s Hospital (M.M.B.), and Massachusetts General Hospital Cancer Center (B.A.C.), Boston; Tulane Medical School, New Orleans (O.S.); Pfizer (M.L.R.) and Carmine Research (C.H.-J.), New York; Food and Drug Administration, Silver Spring, MD (S.K.); Amgen, Thousand Oaks, CA (D.M.R.); and Project Data Sphere, Cary, NC (M.J.M.).

No hay comentarios: