Data in an AI-driven World
Every day 2.5 million terabytes of new data are produced by humanity. Companies like Facebook, Google, and Amazon were built on processing volumes of data and turning this data into meaningful information. Or at least, the information is meaningful enough to allow artificial intelligence (AI) algorithms to suggest a friend to add to your network, or to recommend the sprinkler to buy the kids for summertime fun.
But it might be a while before we have the same level of predictive power in radiation oncology. While we produce enormous volumes of data in the clinic, there are several roadblocks preventing its use in sophisticated models for patient benefit. The ability of these models to generalize the principles learned on training data—effectively and accurately—is largely predicated on the volume and quality of the data used.
There are a couple of problems here. First, accurately labeled data sets are needed. Quality labeled data is hard to come by, but it allows the model builders to create a level of ground truth and determine the accuracy of the model’s predictions relative to the human expertise that is encapsulated in the labels. Second, sufficiently large datasets with diverse samples are required. In a simplistic example, if a dog breed recognition model is trained on a set that only includes three breeds of dogs, it will only be able to detect those three breeds accurately.
Having sufficient high-quality data is the linchpin for success in solving problems with AI. Yet, while there are virtually limitless volumes of data, how do we (1) access the data and (2) sort out the incomplete, erroneous, or meaningless data?
The Problem – Data Silo Dragons
- How can Protected Health Information (PHI) remain protected through de-identification while still making enough data available to be useful?
- How can the data be extracted efficiently and (if necessary) transformed into a format that can be used for training AI solutions? Clearly if it takes weeks of manual work from clinical staff to extract the data, it’s not going to happen.
- How can the data be efficiently cleaned and standardized? For example, if labeling is inconsistent, the data must be modified to match a standard convention (like TG-263).
- How can institutional barriers to data sharing be addressed?
- Is legal review and a data-use agreement necessary? Who in the institution can “sign off” on sharing the institution’s data?
- What does the institution receive in exchange for sharing their valuable data?
- What concerns do members of the clinical team have when it comes to data sharing? Are those concerns valid and/or is there a technical solution that would address them?
Who “wins” when data is shared?
Everybody. Here’s why:
2. The clinic wins because the solutions that are developed with their data are better able to account for the nuances of their treatment approaches (i.e. the AI becomes more general). They will be able to treat patients much sooner after their initial consultation with a radiation oncologist. The tools will ultimately enable them to implement advanced treatment approaches such as adaptive radiotherapy.
3. Vendors (like Radformation) win because they are able to develop novel products that enable better and more efficient patient care.
Please consider this: If indeed our purpose as members of the medical community is to provide the best possible care to our patients and there is a resource that sits idle which is absolutely necessary to progress in that endeavor and over which we have control, do we not have an obligation to do what we can to make the best possible use of that resource to improve patient care?
How to Slay the Dragon – Radformation’s Approach
So what about the Bureaucratic obstacles? If you are a clinician with access to a data silo, you are in a unique position to help slay the dragon. You understand the clinic’s challenges, and you understand the importance of data in developing accurate tools to improve patient care. What can you do?
I have a few recommendations:
In the US, state laws vary and only New Hampshire specifies that medical records are property of the patient. Other states either have no specific law or specify that medical records are property of the provider or hospital.
2. Find out if there are any specific policies within your institution for sharing data and push for a streamlined data-sharing pipeline in your institution such that when an opportunity presents itself (e.g., a clinical trial or a collaboration with a vendor), it does not take inordinate amounts of time and resources to get final approval or a data-use agreement set up.
3. Find the best outlets to share data and connect with them. This might be vendors (like Radformation) or perhaps clinical trials. In the recently published HyTEC introduction, there is an entire section dedicated to “Opportunities for Better Data.” The “idealized future state” envisioned in that article will only happen if we work together with “a concerted multi-pronged approach (e.g., involving vendors, administrators, providers, etc).” We must understand that the insights and evidence currently hoarded by the data silo dragons have the power to literally save lives – slaying these dragons won’t be easy, but “we should not shirk from this challenge.”
From Radformation’s perspective, the hardest part about slaying the dragon is not the Technical obstacles. The biggest challenge is changing the culture around data sharing. Sequestering data in private vaults prevents large-scale analysis of clinical practices, decreases confidence in findings, prevents the development of better tools, and limits the quality of care that patients can receive.
At Radformation, we are pioneering ways to enable clinics and their patients to benefit from sharing data, and our new product AutoContour represents a great opportunity for clinics to collaborate with us in that endeavor. If you want a new style, structure, support for special cases (HDR, contrast scans, synthetic CT, etc.), or a structure to perform better on your data, adding that functionality to AutoContour starts with the contribution of data using our automated anonymized data export tools. The development of Autocontour simply could not have happened without the aid of our outstanding clinical partners who faced down their data silo dragons with us. We hope you join in this crusade to use your data for the development of tools that push radiation oncology into the future.
If you are interested in becoming a clinical partner, reach out to us at firstname.lastname@example.org.