Sector News

Unlocking small data, the ‘next frontier’ in drug discovery

April 15, 2019
Life sciences

We’ve all heard of Big Data—capital B, capital D—and the myriad ways the life sciences industry can exploit it. Algorithms can be used to make sense of vast datasets, helping pathologists analyze tissue samples to diagnose disease, or crunching genomic data to predict a person’s risk for aneurysm. But what about “small data?” Can we apply the tools we have used for big data in situations where little or no data are available?

First things first: With the utility of big data, why are biotech companies even thinking about “small data?”

When big data was “all the rage,” many players started out by “exploiting the wealth of large databases, public repositories, patent data, [scientific] literature data”—essentially big datasets that were already out there. This work was important in getting the field started, Andrew Hopkins, CEO of Exscientia, a company using an artificial intelligence-based platform in drug design, told FierceBiotech.

However, working with existing datasets—however large—is not a direct path to innovation: “We quickly understood that if you are using machine-learning models dependent on having a large amount of data, the downside is most projects tend to have been projects that have already been well worked on. The cutting edge in drug discovery is in first-in-class, novel targets, where there is actually very little data,” he said.

Small data, he said are “the next frontier of problems we really need to solve.”

“Ultimately, every drug discovery project starts off as a small-data project,” he said. “And of course, that is where the commercial imperative is as well, to develop innovative medicines.”

He’s talking about coming up with new compounds and then optimizing them—figuring out which compound to make next and which experiment to conduct next to eventually arrive at a drug candidate. Because after all, an active compound does not a drug make. It has to tick a number of other boxes—selectivity to its target, solubility so it actually gets absorbed into the body, and so on—before it can be moved forward as a drug candidate.

“At the start of a drug discovery project, you might have five compounds that are active, but you don’t know their selectivity or their solubility—you don’t know a lot of things,” said Willem van Hoorn, chief decision scientist at Exscientia.

One way to learn all those things is to conduct “a gazillion of experiments,” van Hoorn said. “You will get there, even if you do experiments at random. But it’s not a very efficient way.”

Or, “people often try to force the use of a sophisticated model like deep learning… that works in a big data environment, in a lower data environment,” said Therence Bois, co-founder and director of operations at InVivoAI, a Montreal-based startup focusing on deep learning algorithms for low-data situations.

“The key thing is understanding what the problem is and designing the appropriate technology to solve it,” said Hopkins and van Hoorn.

“The real power of algorithms comes from exploiting very large datasets—the larger it is, the more powerful it is,” Hopkins said. Those types of datasets aren’t available in drug discovery, so a different approach is needed in low-data settings.

Both Exscientia and InVivoAI work with active-learning models, which—as the name suggests—don’t just make predictions based on existing data.

“With deep learning, you generally get models that perform very, very well for the data they were trained on. But give them a new set of compounds or a new set of samples, they perform quite poorly,” said Daniel Cohen, co-founder and CEO of InVivoAI.

Active learning models can take in new data, learn from them and become better models: “We can get better predictive models, and ultimately, better compounds, by actively interfacing with medicinal chemistry teams,” Cohen said. They can generate new compounds, new structures that perform better than the ones used to train the model.

“Using active learning is kind of putting a human into the loop. After generating molecules in silico, we can use active learning to test the molecules in a real wet lab and get more data to put back into the initial model,” Bois added. This can help avoid hurdles, such as having a model generate a compound that, in theory, would work very well, but is too complicated or too costly to synthesize in real life.

Exscientia’s using active learning to identify and prioritize the compounds its models believe will provide more information to quickly get through optimization.

“Which compound should we make next? Which one will give me the greatest learning, the greatest information to optimize my project faster?” Hopkins said. “We are asking the question: how can you learn as fast as possible?”

The next compound the model predicts may not yet be a drug candidate, but it might be one that yields data to make the model better—a step in the right direction, van Hoorn said.

The British company counts Celgene, GlaxoSmithKline, Sanofi, Roche and Evotec among its partners.

InVivoAI, too, is working with partners to address different challenges in a small-data environment. And when it says small data, it doesn’t just mean small datasets. It also means noisy data, heterogeneous data, or a dataset in which not many compounds were sampled. Basically, a dataset characteristic of early-stage drug discovery, Cohen said.

One of its projects involved working with cells taken directly from 20 patients, which meant the team could only screen so many compounds. Using a virtual library of only 1,200 compounds, InVivoAI developed a model that would predict how each of the compounds would work in each of the 20 cell lines.

“It was a challenge, but it worked quite well,” Cohen said. “The next step is to generate entirely new compounds optimized for activity on a patient-by-patient basis.”

Eventually, InVivoAI plans to test the compounds it generates in vitro and then in vivo.

“A lot of people propose interesting computational models, but they don’t actually test that out. If you create a model, can you prove you can go synthesize them and prove they’re behaving the way you think they’re behaving?” Cohen said. Because that’s what the biopharma industry wants to see: computational approaches leading to real outcomes, a.k.a. drug candidates.

“At the end of the day, to create something new, we need to go beyond current models with a handful of millions of data points,” he said. “We need to go beyond the known universe to get new stuff.”

By Amirah Al Idrus

Source: Fierce Biotech

comments closed

Related News

April 20, 2024

CureVac and MD Anderson Cancer Center partner to develop new cancer vaccines

Life sciences

CureVac and the University of Texas’s MD Anderson Cancer Center have announced a co-development and licensing agreement to develop novel messenger ribonucleic acid (mRNA)-based cancer vaccines. The strategic collaboration will focus on the development of differentiated cancer vaccine candidates in selected haematological and solid tumour indications with high unmet medical needs.

April 20, 2024

FUJIFILM plans $1.2 billion investment in major US manufacturing facility

Life sciences

FUJIFILM Corporation is planning to invest $1.2 billion to expand the planned FUJIFILM Diosynth Biotechnologies manufacturing facility in Holly Springs, North Carolina, US. This news follows the organisation’s announcement of a $2 billion investment in the facility in March 2021. This additional financial boost totals the investment to over $3.2 billion, FUJIFILM confirmed.

April 20, 2024

Sanofi cuts staff in Belgium as early-stage research dwindles

Life sciences

Sanofi’s global restructuring and downsizing is now fully underway, with layoffs stretching to the company’s Belgian offices. Belgian newspaper De Tijd reports that 67 employees have been laid off at a site in Ghent and 32 jobs are on the chopping block at Sanofi’s Belgium HQ in Diegem.

How can we help you?

We're easy to reach