HEALTHCARE & MEDICARE

AI companies investing in better data pipelines are winning faster regulatory approvals — here’s why

As the development of artificial intelligence medical devices accelerates, we are witnessing a major shift in healthcare. From improved testing and risk stratification to drug development and clinical workload, they have huge potential to improve health equity.

To enable these innovations, high-quality real-world medical data is critical for training AI models. Getting these solutions into the hands of clinicians and ultimately helping patients requires strict regulatory oversight. This process ensures the technology is safe, effective and ready for clinical use.

Why medical AI relies on data

Small, unrepresentative training data sets that do not come from the target population hinder AI performance, leading to a range of downstream problems, including exacerbating existing biases, creating products that lack generalizability, and producing inaccurate outputs.

Sometimes small data sets are unavoidable, especially for rare diseases. However, using them in this context leads to underrepresentation and reduces the ability of AI models to generalize across the population.

Algorithms trained on narrow population samples have limitations in predicting, detecting, and classifying conditions in broader patient populations, exacerbating health disparities and leading to poorer patient outcomes. If the data sets used to train and test a model are not representative of the intended population, the model may not produce accurate results and any testing will not properly validate the results. This means that the model cannot generalize beyond the population it was trained on. Models trained and tested on this data may also produce inaccurate output if additional information (such as labels and clinical reports) is not provided or contains errors.

Minimizing bias is an important aspect of training and testing data for AI medical devices. Identifying and mitigating potential bias is also a key component of regulators’ focus. AI bias can cause many problems, but large and sufficiently diverse training data sets can help mitigate it.

While training data is the primary consideration, test data must also be representative of the target population. It should be high quality, diverse, and large enough to ensure the accuracy and practical usefulness of the model. Training and test data must also be appropriately independent to ensure that the tests truly assess the accuracy and effectiveness of the algorithm, providing evidence of real-world performance.

These challenges are not just technical; They influence how regulators assess the safety and effectiveness of AI devices.

Why is this important for regulatory submissions?

Preparing for regulatory submissions is a key driver for clients contacting us. One common thread we see is the need to train and test their devices based on data from the different areas they enter. Regulators increasingly require detailed information on the representativeness of data from new medical devices.

Agencies such as the U.S. Food and Drug Administration have issued guidance on how to address data management issues, with a particular focus on training and testing data used to ensure the effectiveness, accuracy, and usefulness of medical devices. Ensuring transparency and responsible AI development are key to building devices that are effective and comply with evolving regulatory guidelines.

Typically, regulators require complete documentation and records describing how the data was obtained, how it was partitioned for training and testing, how the data was processed, stored and annotated, and a host of other information points. From the outset, good data practices make it easier for developers to gather the information needed for regulatory submissions. Knowing that training and testing data has been appropriately sourced and managed, and is large and diverse, can help regulators reduce the need for further validation because they are confident that the device will work accurately in its target population.

Slowness in obtaining the breadth and diversity of data required ultimately slows down regulatory submissions. Underrepresentation of training and test data may also be a reason for rejecting regulatory submissions.

Looking to the future

In the race to deploy artificial intelligence in healthcare, speed is of the essence. But speed without structure can lead to frustration. Healthcare AI developers who prioritize data early will cross the regulatory finish line faster and more reliably.

In the current environment, better data means more than just better algorithms. This is the key to getting to market faster, delivering better clinical performance and improving patient lives.

Photo: Sue Patman, Getty Images


Joshua Miller is CEO and co-founder of Gradient Health and holds a BS/BSE in Computer Science and Electrical Engineering from Duke University. He has spent his career building companies, first founding FarmShots, a Y Combinator-backed startup that grew into an international business and was acquired by Syngenta in 2018. Subsequently, he continued to serve on the boards of several companies and made angel investments in more than 10 companies in the environmental technology, pharmaceutical and financial technology fields.

This article appeared in Medical City Influencers program. Anyone can share their thoughts on healthcare business and innovation on MedCity News through MedCity Influencers. Click here to learn how.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button