HEALTHCARE & MEDICARE

Mayo Clinic Platform COO believes de-identified data will render pharmaceutical companies and other third parties useless

Maneesh Goyal, chief operating officer of the Mayo Clinic platform, takes patient privacy very seriously, but disagrees with the way it is typically represented in health care: de-identifying data under the HIPAA safe harbor approach.

“Many organizations will take patient data and de-identify it, and once it's de-identified, it's no longer considered HIPAA data,” Goyal said in a recent interview. “We think it's interesting, but not enough to protect patient data because, especially as you have more and more computing power, you can actually figure it out.”

In a recent interview, he explained the approach the Mayo Clinic platform is taking to protecting privacy within the broader context of its Orchestrate platform. It is a data platform that allows biopharmaceutical and medical technology companies to leverage the rich Mayo Clinic platform data and combine it with high-quality research and core laboratory expertise to accelerate their own drug discovery and power clinical development programs. On February 11, the Rochester, Minn.-based health system announced that the Orchestrate platform will now give researchers access to standardized, real-world cancer data from Mayo Clinic and participating Mayo Clinic Platform Connect partners.

So, how does the Mayo Clinic ensure patient privacy, especially considering that this data is being provided to external users such as pharmaceutical and medical technology companies? Why is this important?

“The way we approach de-identification is not just to remove everything that could be an identifier, but actually to change it,” he said, giving the example of his own medical records. “So our tool goes in there, replaces it with a fictional person, but leaves the clinical notes there. And then we do date conversion, random date conversion on the entire clinical record. So God forbid I get in a car accident on a certain date, that's public information, and you now move it away from that date. So I'm no longer identifiable.”

Goyal said Mayo Clinic has about 100 petabytes of structured and unstructured EHR data, about 28 petabytes of which has been de-identified. Unstructured data in clinical records is important because it explains the provider's rationale for making a diagnosis or other decision. All de-identified data is stored in “cloud containers”.

“Then we created a container where the data never leaves the container and has now withstood the U.S. regulatory system,” Goyal explained, adding that it also complies with foreign regulatory environments. “So when we provide access, we provide access in a sandbox in a controlled environment. There are no individual patient records that are visible. We check everything outside of the system. So no data ever escapes our control.”

This is what is called a clean room environment, Goyal said. Another popular term for a data access process that preserves patient privacy is called “federated learning,” and at Mayo Clinic it applies to health system partners that join the Mayo Clinic platform, such as Israelita Albert Einstein Hospital in Brazil.

“Federated learning is basically when you send a question to all these different data sets and then you get an aggregated answer. But each environment has to support this closed container, and no one has access to the central area where all the information is,” Goyal said.

This enables pharmaceutical companies to run computational jobs or train artificial intelligence models, or simply conduct queries to better understand target diseases. For example, a pharmaceutical company could ask a question like, “Find the course of disease

Other actions are possible, which is at the heart of wasted money in clinical development. Clinical trials need to be reproducible, and in the past one would actually have to conduct the trials to know if they were reproducible. In many cases, they either fail or cannot be replicated for various reasons, such as the wrong sample size or a flawed experimental design. Pharmaceutical companies will realize this only after spending time, effort and money.

Now, with Mayo Clinic Orchestrate, pharmaceutical companies can create synthetic versions of clinical trials to see if results can be replicated in larger groups of patients.

“So one way our pharmaceutical partners are using it is to validate their trial hypotheses,” he said. “Our approach is to use real data from real populations and put as much of it into one repository so you can run comprehensive trials on real data. You can actually say, 'Does this work? Do we have enough patients in a large non-patient population to run the trial the way I imagine?'”

But it’s not just about querying data, training AI models, or validating hypotheses. Goyal explained that Orchestrate is all about integrating fragmented R&D processes into a comprehensive platform. For example, if a pharmaceutical company wanted to conduct a trial for inflammatory bowel disease and came to the Mayo Clinic to recruit patients, Orchestrate's process would look like this.

“So they identified a group of patients. We can do that in de-identified data. We hired IBD experts at the Mayo Clinic, we developed a group of patients, and then we did an IRB and quickly recruited them for additional tissue sample collection,” Goyal said. “Now the power of it is taking tissue samples within our own infrastructure, doing all the analysis, such as genetic proteomics, epigenetic pathology, doing it based on longitudinal patient data, putting it back into the clinical record in an identified way, and then handing it over to our pharmaceutical partners and saying, now is your playground to invent and identify targets that are important for your condition.”

Access to the orchestra's programming is subscription-based, he said.

Photo: Claudio Ventrella, Getty Images

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button