Unlocking small data, the ‘next frontier’ in drug discovery

Borderless Insights

The Skills-Forward Revolution: Stretching the Boundaries of Professional Flexibility

Sustainability: Business Leaders Must Secure the Long-Term Strategy Despite Short-Term Pressures

Negotiating Terms with a New Employer

Unlocking small data, the ‘next frontier’ in drug discovery

April 15, 2019

Life sciences

We’ve all heard of Big Data—capital B, capital D—and the myriad ways the life sciences industry can exploit it. Algorithms can be used to make sense of vast datasets, helping pathologists analyze tissue samples to diagnose disease, or crunching genomic data to predict a person’s risk for aneurysm. But what about “small data?” Can we apply the tools we have used for big data in situations where little or no data are available?

First things first: With the utility of big data, why are biotech companies even thinking about “small data?”

When big data was “all the rage,” many players started out by “exploiting the wealth of large databases, public repositories, patent data, [scientific] literature data”—essentially big datasets that were already out there. This work was important in getting the field started, Andrew Hopkins, CEO of Exscientia, a company using an artificial intelligence-based platform in drug design, told FierceBiotech.

However, working with existing datasets—however large—is not a direct path to innovation: “We quickly understood that if you are using machine-learning models dependent on having a large amount of data, the downside is most projects tend to have been projects that have already been well worked on. The cutting edge in drug discovery is in first-in-class, novel targets, where there is actually very little data,” he said.

Small data, he said are “the next frontier of problems we really need to solve.”

“Ultimately, every drug discovery project starts off as a small-data project,” he said. “And of course, that is where the commercial imperative is as well, to develop innovative medicines.”

He’s talking about coming up with new compounds and then optimizing them—figuring out which compound to make next and which experiment to conduct next to eventually arrive at a drug candidate. Because after all, an active compound does not a drug make. It has to tick a number of other boxes—selectivity to its target, solubility so it actually gets absorbed into the body, and so on—before it can be moved forward as a drug candidate.

“At the start of a drug discovery project, you might have five compounds that are active, but you don’t know their selectivity or their solubility—you don’t know a lot of things,” said Willem van Hoorn, chief decision scientist at Exscientia.

One way to learn all those things is to conduct “a gazillion of experiments,” van Hoorn said. “You will get there, even if you do experiments at random. But it’s not a very efficient way.”

Or, “people often try to force the use of a sophisticated model like deep learning… that works in a big data environment, in a lower data environment,” said Therence Bois, co-founder and director of operations at InVivoAI, a Montreal-based startup focusing on deep learning algorithms for low-data situations.

“The key thing is understanding what the problem is and designing the appropriate technology to solve it,” said Hopkins and van Hoorn.

“The real power of algorithms comes from exploiting very large datasets—the larger it is, the more powerful it is,” Hopkins said. Those types of datasets aren’t available in drug discovery, so a different approach is needed in low-data settings.

Both Exscientia and InVivoAI work with active-learning models, which—as the name suggests—don’t just make predictions based on existing data.

“With deep learning, you generally get models that perform very, very well for the data they were trained on. But give them a new set of compounds or a new set of samples, they perform quite poorly,” said Daniel Cohen, co-founder and CEO of InVivoAI.

Active learning models can take in new data, learn from them and become better models: “We can get better predictive models, and ultimately, better compounds, by actively interfacing with medicinal chemistry teams,” Cohen said. They can generate new compounds, new structures that perform better than the ones used to train the model.

“Using active learning is kind of putting a human into the loop. After generating molecules in silico, we can use active learning to test the molecules in a real wet lab and get more data to put back into the initial model,” Bois added. This can help avoid hurdles, such as having a model generate a compound that, in theory, would work very well, but is too complicated or too costly to synthesize in real life.

Exscientia’s using active learning to identify and prioritize the compounds its models believe will provide more information to quickly get through optimization.

“Which compound should we make next? Which one will give me the greatest learning, the greatest information to optimize my project faster?” Hopkins said. “We are asking the question: how can you learn as fast as possible?”

The next compound the model predicts may not yet be a drug candidate, but it might be one that yields data to make the model better—a step in the right direction, van Hoorn said.

The British company counts Celgene, GlaxoSmithKline, Sanofi, Roche and Evotec among its partners.

InVivoAI, too, is working with partners to address different challenges in a small-data environment. And when it says small data, it doesn’t just mean small datasets. It also means noisy data, heterogeneous data, or a dataset in which not many compounds were sampled. Basically, a dataset characteristic of early-stage drug discovery, Cohen said.

One of its projects involved working with cells taken directly from 20 patients, which meant the team could only screen so many compounds. Using a virtual library of only 1,200 compounds, InVivoAI developed a model that would predict how each of the compounds would work in each of the 20 cell lines.

“It was a challenge, but it worked quite well,” Cohen said. “The next step is to generate entirely new compounds optimized for activity on a patient-by-patient basis.”

Eventually, InVivoAI plans to test the compounds it generates in vitro and then in vivo.

“A lot of people propose interesting computational models, but they don’t actually test that out. If you create a model, can you prove you can go synthesize them and prove they’re behaving the way you think they’re behaving?” Cohen said. Because that’s what the biopharma industry wants to see: computational approaches leading to real outcomes, a.k.a. drug candidates.

“At the end of the day, to create something new, we need to go beyond current models with a handful of millions of data points,” he said. “We need to go beyond the known universe to get new stuff.”

By Amirah Al Idrus

Source: Fierce Biotech

comments closed

Related News

July 21, 2024

CordenPharma invests €900m in peptide platform expansion

Life sciences

CordenPharma announced its largest strategic investment to date, committing to spend ~€900m over the next three years to enhance its peptide technology platform. The planned investment consists of two major expansion initiatives occurring in parallel in the US and Europe, including both existing facilities and new constructions.

July 21, 2024

DSM-Firmenich to sell MEG-3 fish oil business to KD Pharma Group

Life sciences

DSM-Firmenich has announced the sale of its MEG-3 fish oil business to KD Pharma Group, a contract development and manufacturing organisation that is active in pharmaceutical and nutritional lipids. As part of the transaction, DSM-Firmenich will obtain a minority stake of 29% in KD Pharma’s parent company O³ Holding GmbH.

July 21, 2024

Veranova appoints Cécile Maupas as Senior Vice President, Chief Commercial Officer

Life sciences

Veranova, a development and manufacturing of specialist and complex APIs for the pharmaceutica l and biotech sectors, recently announced the appointment of Cécile Maupas as Senior Vice President, Chief Commercial Officer. Cécile will join the executive team and assume responsibility for business development, marketing, project management, commercial operations, and product management.

More news

Sector News

Categories

Borderless Insights

Unlocking small data, the ‘next frontier’ in drug discovery

Related News

CordenPharma invests €900m in peptide platform expansion

DSM-Firmenich to sell MEG-3 fish oil business to KD Pharma Group

Veranova appoints Cécile Maupas as Senior Vice President, Chief Commercial Officer

How can we help you?

We're easy to reach

Global Hub

Phone

E-mail

Services

Sectors

Find out more