Nightingale Open Science is a platform that connects researchers with world-class medical data. We work closely with health systems around the world to create and curate datasets of medical images linked to ground-truth labels. We carefully deidentify the data and make it available for non-profit research on our cloud infrastructure.
We focus on datasets that will help researchers make breakthroughs for unsolved medical problems.
Consider sudden cardiac death, which kills 300,000 Americans every year. Many papers have been written on factors that put people at higher risk—but even after looking back at the vast majority of deaths, we still cannot find an identifiable cause. Or cancer: improved screening since the 1990’s has helped us identify more small tumors—but we still haven’t been able to translate this into lower rates of late-stage diagnoses or death.
We believe the key to solving these mysteries lies in the massive volumes of complex imaging data health systems produce every day: electrocardiogram waveforms, x-rays and CT scans, tissue biopsy images, and more. Today, these data are interpreted by humans, but our research is providing clues that machine learning can open up new ways of ‘seeing’ signals and patterns in the data that humans cannot.
Unfortunately, existing medical data with the potential to shed light on these patterns have historically been siloed. By making this data accessible to broad groups of interdisciplinary researchers, we can begin to unlock discoveries that save lives, surfacing previously unknown patterns of disease.
This is the vision underlying Nightingale Open Science: an open platform housing cutting-edge, deidentified medical datasets that are available to a diverse, global community of researchers.
Our goal is to foster researcher collaborations across disciplines, bringing together computer science researchers, clinicians, and economists around critical questions that will push the boundaries of medical research and spur the field of computational medicine.
Computational medicine is a new field at the intersection of medicine, statistics, and computation. But this field is being stymied by lack of data.
Fields like computer vision and natural language processing have benefitted from shared data where researchers can compete and collaborate on high value questions and problems - such as ImageNet for object detection and MNIST for digit recognition.
But computational researchers have no comparable datasets to answer critical questions in health and medicine. Making such datasets available is a key part of building this new field. Once researchers have the raw material they need to develop and apply new computational techniques to medicine, we expect to see similar leaps and bounds in our understanding and capabilities as occurred in other fields.
We focus on data that is high-dimensional, such as imaging and waveforms, which are ideally suited to machine learning.
We also emphasize linking these imaging data to ground-truth outcomes: what happened to the patient’s health, not just what a doctor thought about an image. This allows researchers to develop algorithms that learn from nature—not from humans.
We work with a variety of health systems in the United States and internationally to define specific and compelling research questions. We collaboratively build a dataset around those research questions and help the institution conduct analysis and gather findings. We then feature some of the deidentified data variables on our platform for researcher collaboration and competition.
Interested in partnering with us? Contact us here.
We are open to focusing on a broad range of clinical areas with our partners. Here are examples of work from previous partnerships:
All data that is featured on Nightingale is completely de-identified and non-PHI. Our platform hosts researchers inside a secure, monitored computing environment tailored for cutting-edge AI research.
All content on the Nightingale Open Science platform is strictly limited to non-commercial academic research.