The journey from identifying a potential therapeutic compound to U.S. Food and Drug Administration (FDA) approval of a new drug can take well over a decade and cost upwards of a billion dollars. A team of CUNY Graduate Center researchers has developed a novel artificial intelligence model that could significantly improve the accuracy and reduce the time and cost of drug development.

The new model, called CODE-AE, can screen novel drug compounds to accurately predict efficacy in humans, according to a paper to be published today (October 17) in Nature Machine Intelligence. It was also able to identify personalized drugs for over 9,000 patients in tests, which could help treat their conditions. Scientists anticipate that the technique will significantly speed up drug discovery and precision medicine.

Accurate and robust prediction of patient-specific responses to a new chemical compound is critical for developing safe and effective therapeutics as well as selecting an existing drug for a specific patient. However, directly testing a drug’s efficacy in humans is unethical and impossible. To evaluate the therapeutic effect of a drug molecule, cell or tissue models are frequently used as a surrogate of the human body. Unfortunately, drug efficacy and toxicity in human patients do not always correlate with drug effect in a disease model. This knowledge gap is a major contributor to drug discovery’s high costs and low productivity rates.

“Our new machine learning model can address the translational challenge from disease models to humans,” said Lei Xie, senior author of the paper and a professor of computer science, biology, and biochemistry at the CUNY Graduate Center and Hunter College. “CODE-AE employs biology-inspired design and makes use of recent advances in machine learning.” One of its components, for example, employs similar techniques in Deepfake image generation.”

According to You Wu, a CUNY Graduate Center Ph.D. student and co-author of the paper, the new model can provide a solution to the problem of not having enough patient data to train a generalized machine learning model. “Although many methods for using cell-line screens to predict clinical responses have been developed,” Wu said, “their performances are unreliable due to data incongruity and discrepancies.” “CODE-AE effectively alleviated the data-discrepancy problem by extracting intrinsic biological signals masked by noise and confounding factors.”

As a result, CODE-AE significantly outperforms state-of-the-art methods in predicting patient-specific drug responses based solely on cell-line compound screens.

The next challenge for the research team in advancing the technology’s use in drug discovery is to develop a method for CODE-AE to predict the effect of a new drug’s concentration and metabolization in human bodies. The researchers also mentioned that the AI model could be tweaked to accurately predict drug side effects in humans.

Source