The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies
Artificial intelligence (AI) has the potential to revolutionize the drug discovery process, offering improved efficiency, accuracy, and speed. However, the successful application of AI is dependent on the availability of high-quality data, the addressing of ethical concerns, and the recognition of the limitations of AI-based approaches. In this article, the benefits, challenges, and drawbacks of AI in this field are reviewed, and possible strategies and approaches for overcoming the present obstacles are proposed. The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods, as well as the potential advantages of AI in pharmaceutical research, are also discussed. Overall, this review highlights the potential of AI in drug discovery and provides insights into the challenges and opportunities for realizing its potential in this field.
Note from the human authors: This article was created to test the ability of ChatGPT, a chatbot based on the GPT-3.5 language model, in terms of assisting human authors in writing review articles. The text generated by the AI following our instructions (see Supporting Information) was used as a starting point, and its ability to automatically generate content was evaluated. After conducting a thorough review, the human authors practically rewrote the manuscript, striving to maintain a balance between the original proposal and the scientific criteria. The advantages and limitations of using AI for this purpose are discussed in the last section.
Keywords: artificial intelligence; drug discovery; AI-assisted content generation; AI-limitations
1. Methods for Writing this Paper
This blog was created with the assistance of ChatGPT, an AI language model developed by OpenAI, which generates human-like text based on prompts provided by the user. For each article, the initial drafts, summaries, and titles were generated by ChatGPT after I provided the topic, purpose, and specific instructions. All AI-generated content is then reviewed, edited, and expanded with additional research to ensure clarity, accuracy, and proper referencing. Images and infographics are created using DALL-E or sourced from free high-quality libraries, and all references are checked for validity. This process allows me to combine the efficiency of AI with human critical thinking and expertise, delivering insightful content about Artificial Intelligence and its applications in drug discovery and biomedicine.
2. Introduction to AI and Its Potential for Use in Drug Discovery
The use of artificial intelligence (AI) in medicinal chemistry has gained significant attention in recent years as a potential means of revolutionizing the pharmaceutical industry. Drug discovery, the process of identifying and developing new medications, is a complex and time-consuming endeavor that traditionally relies on labor-intensive techniques, such as trial-and-error experimentation and high-throughput screening. However, AI techniques such as machine learning (ML) and natural language processing offer the potential to accelerate and improve this process by enabling more efficient and accurate analysis of large amounts of data. The successful use of deep learning (DL) to predict the efficacy of drug compounds with high accuracy has been described recently by the authors of [5]. AI-based methods have also been able to predict the toxicity of drug candidates. These and other research efforts have highlighted the capacity of AI to improve the efficiency and effectiveness of drug discovery processes. However, the use of AI in developing new bioactive compounds is not without challenges and limitations. Ethical considerations must be taken into account, and further research is needed to fully understand the advantages and limitations of AI in this area. Despite these challenges, AI is expected to significantly contribute to the development of new medications and therapies in the next few years.
3. Limitations of the Current Methods in Drug Discovery
Currently, medicinal chemistry methods rely heavily on a hit-and-miss approach and large-scale testing techniques. These techniques involve examining large numbers of potential drug compounds, in order to identify those with the desired properties. However, these methods can be slow, costly, and often yield results with low accuracy. In addition, they can be limited by the availability of suitable test compounds and the difficulty of accurately predicting their behavior in the body.
4. The Role of ML in Predicting Drug Efficacy and Toxicity
One of the key applications of AI in medicinal chemistry is the prediction of the efficacy and toxicity of potential drug compounds. Classical protocols of drug discovery often rely on labor-intensive and time-consuming experimentation to assess the potential effects of a compound on the human body. This can be a slow and costly process, and the results are often uncertain and subject to a high degree of variability. AI techniques such as ML are able to overcome these limitations. Based on the analysis of a large amount of information, ML algorithms can identify patterns and trends that may not be apparent to human researchers.
5. The Impact of AI on the Drug Discovery Process and Potential Cost Savings
Another key application of AI in drug discovery is the design of novel compounds with specific properties and activities. Traditional methods often rely on the identification and modification of existing compounds, which can be a slow and labor-intensive process. AI-based approaches, on the other hand, can enable the rapid and efficient design of novel compounds with desirable properties and activities.
6. Case Studies of Successful AI-Aided Drug Discovery Efforts
The potential of AI in the context of drug discovery has been demonstrated in several case studies. For example, the successful use of AI to identify novel compounds for the treatment of cancer has recently been reported by Gupta, R., et al. These authors trained a DL algorithm on a large dataset of known cancer-related compounds and their corresponding biological activity. As an output, novel compounds with high potential for future cancer treatment were obtained.
7. The Role of Collaboration between AI Researchers and Pharmaceutical Scientists
The role of collaboration between AI researchers and pharmaceutical scientists is crucial in the development of innovative and effective treatments for various diseases. By combining their expertise and knowledge, they can create powerful algorithms and machine-learning models intended to predict the efficacy of potential drug candidates and speed up the drug discovery process.
8. Challenges and Limitations of Using AI in Drug Discovery
Despite the potential benefits of AI in drug discovery, there are several challenges and limitations that must be considered. One of the key challenges is the availability of suitable data. AI-based approaches typically require a large volume of information for training purposes. In many cases, the amount of data that is accessible may be limited, or the data may be of low quality or inconsistent, which can affect the accuracy and reliability of the results.
AI in Drug Repurposing for New Medical Indications
1. Introduction
Drug repurposing (also called drug repositioning, reprofiling, redirecting, and drug rediscovery) is a strategy for identifying new therapeutic purposes for approved drugs in medical indications beyond the scope of their original therapeutic use. Drug repurposing offers various advantages over the de-novo development of entirely new drugs, including the possibility to speed-up the discovery process and to reduce failure rates in the clinical development and testing phases. In particular, drug repurposing makes it possible to avoid safety evaluation in preclinical models and humans, hence leading to potentially lower overall development costs, if the safety testing has been completed for the original indication and it displays dose-compatibility with the new indication. Traditionally, drug repurposing success stories have mainly resulted from largely opportunistic and serendipitous findings; for example, sildenafil citrate was originally developed as an antihypertensive drug, but later repurposed by Pfizer and marketed as Viagra for the treatment of erectile dysfunction based on retrospective clinical experience, leading to massive worldwide sales.
Over recent years, a number of computational approaches have been developed for a more systematic drug repurposing process. Popular information sources for in-silico drug repurposing include, for instance, electronic health records, genome-wide association analyses or gene expression response profiles, pathway mappings, compound structures, target-binding assays, and other phenotypic profiling data. Several systematic review articles on the use of computational approaches are available, which cover also machine learning (ML) and artificial intelligence (AI) algorithms, such as those based on network propagation, matrix factorization, and completion, as well as recently developed deep learning models. Databases and other resources supporting in-silico drug repurposing, such as Drug Repurposing Hub and RepurposeDB, have also been recently surveyed. There are also excellent reviews and perspectives on the use of ML and AI approaches in the overall drug discovery and development process, as well as in the lead optimization or designing of completely new molecules.
Our focus here is on supervised ML and AI methods that make use of publicly available databases and information sources. A particular emphasis is placed on the use of comprehensive target activity profiles of drugs as a resource for a systematic repurposing process, in which an existing drug is found to have an off-target effect or a newly recognized on-target effect for a new indication, hence providing sufficient evidence to take it forward for further development and commercial exploitation. Such target-based drug repurposing makes use of the fact that most drugs are not specific for any single target, but rather display a wide spectrum of target activity. In cancer applications, some of the unintended off-targets correspond to known anticancer targets, while others may reveal new cancer vulnerabilities. However, we note that drug repurposing is not by any means limited to anticancer applications alone, but covers various medical indications. For instance, a recent review surveyed how existing drugs may have activity against SARS-CoV-2 to be readily applied to treat COVID-19 patients. Similarly, target repositioning can be used in the field of infectious diseases, where a drug is used to inhibit the ortholog target proteins in other species.
The repurposing process is often initiated after phenotypic observations of adventitious polypharmacological drug activities. For instance, we observed a surprising activity for axitinib, an endothelial growth factor receptor (VEGFR) inhibitor approved for advanced renal cell carcinoma, in primary chronic myeloid leukemia (CML) and acute lymphoblastic leukemia (ALL) cells. Since these cancers are driven by the oncogenic BCR-ABL1 fusion protein, we hypothesized that axitinib might bind to BCR-ABL1. This was confirmed by structural and functional analysis, and interestingly, axitinib bound to T315I-mutated BCR-ABL1 with roughly 40 times higher affinity than to the wild-type BCR-ABL1. Currently, axitinib is being investigated in an alternating regimen with bosutinib for CML patients (NCT02782403). Subsequent reports, however, have indicated that axitinib may lose potency when additional compound mutations emerge in BCR-ABL1, and the drug does not seem to be effective against ponatinib-resistant T315I-mutated cells. These observations raise the question whether one could use AI algorithms to predict at least some of the potential drawbacks already before the repurposing process enters the clinical stage.
2. Data resources for in-silico drug repurposing
We start by going through selected data and information resources that we find useful for in-silico drug repurposing. Rather than providing a systematic review of all developed resources, we mainly focus on information sources motivated by the axitinib repurposing study from the previous section, including resources for drugātarget activity data, cell-based pharmacogenomic data, and chemical structure information. For more comprehensive surveys of various data resources, the reader is referred to recent reviews.
2.1. Drugātarget interaction resources
Comprehensive knowledge about both the intended (on-target) and unintended (off-target) interactions of a drug is crucial for understanding its mechanism of action (MoA) and modeling its efficacy or toxicity across different tissues and cancer types. For example, as demonstrated in studies such as those on axitinib, detailed drugātarget activity profiles can play a significant role in identifying repurposing opportunities for existing compounds.
To simplify the categorization, compoundātarget activity data can be grouped into three types based on the nature of their recorded activity data: quantitative bioactivity data (e.g. Kd, Ki, or IC50 values from doseāresponse assays), binary interactions (datasets including both active and inactive drugātarget pairs), and unary interactions (datasets containing only active pairs). These distinctions inform whether regression or classification algorithms are suitable for the prediction task and whether the dataset provides both positive and negative examples necessary for training robust supervised models.
So far, ChEMBL is the most popular target activity resource for regression modeling (i.e. prediction of quantitative drugātarget binding affinities). Classification algorithms try to predict whether a drug has sufficient potency against the given target. In addition to the problem formulation (regression vs. classification), we have argued that at least the following factors should be taken into consideration in in-silico target prediction studies to avoid reporting overoptimistic drugātarget activity prediction results: (i) multiple evaluation datasets specific to particular drug and target families to evaluate the application domain of the prediction model, (ii) evaluation procedure, where nested cross-validation is preferred over the standard cross-validation, and (iii) prediction problem setting (i.e. whether the training and test sets of compound-target pairs share common drugs and targets, only drugs or targets, or neither, where the latter is often the most challenging case). Obviously, the more comprehensive is the information present in the databases, e.g. in terms of drug classes and target families, the better coverage the prediction algorithm will have. The predicted target activities should also be experimentally validated before suggesting for drug repurposing.Accordingly, we recently organized an IDG-DREAM Challenge, where the teams used bioactivity data from ChEMBL, DTC, and BindingDB to make quantitative target activity predictions, which were later validated using subsequent experimental assays.
2.2. Cell line and patient-derived omics resources
Drugātarget bioactivity information offers possibilities to make informed predictions whether the explored compounds have the possibility to modulate a given target or not, and to what extent, but this information is typically cell context independent. However, since the drug MoA is often highly cell context-specific, it is important to actually measure (or predict) the activity of the compound against the cell model or target using cell-based assays. Cell line omics resources contain drug response data along with multi-omics profiles for established cancer cell lines (in vitro models), whereas patient-derived resources include pharmacogenomic information on the patient primary cells tested against various drugs (ex-vivo models).
2.3. Biological pathway information resources
Biological pathways help us understand the internal processes of cells and how they respond to drugs, supporting drug repurposing research. For example, mapping drug protein targets to the same or related pathways can reveal their mechanisms of action, which is useful when studying both multi-target drugs and combination therapies. However, because different databases may represent the same pathways in various ways, this can lead to differences in results when performing pathway enrichment analyses or building predictive models in precision medicine.
There are several pathway databases that provide information about compound target pathways, each offering different coverage in terms of proteins, compounds, pathways, and interactions. Among them, PathwayCommons and KEGG Pathways are particularly comprehensive, including a large number of biological reactions and interactions. Many pathway databases also offer programmatic access through APIs, which facilitates their integration into AI models for systematic drug discovery and repurposing applications.
2.4. Chemical structure and protein property data resources
Chemical structural descriptors and target protein properties provide essential information for AI and machine learning models used in drug repurposing. There are various online tools and software libraries available to calculate chemical descriptors for drugs and to extract target properties for proteins. For example, ChemCPP calculates kernel functions between compounds, while EDragon computes a wide range of topological and geometrical descriptors. The Open Babel toolkit offers functionalities such as substructure searching and fingerprint calculations for chemical compounds. RDKit provides tools for 2D depiction, molecular serialization, fingerprint generation, and similarity analysis. Additionally, PyDPI is a Python package that can compute molecular descriptors for drugs as well as structural and physicochemical properties for proteins.
2.5. Case Study: Sildenafil Citrate
Sildenafil citrate was originally developed by Pfizer under the name "UK-92480" for the treatment of angina and hypertension. However, during clinical trials, it was observed to have a significant effect on erectile function. As a result, Pfizer shifted its focus and repurposed sildenafil for the treatment of erectile dysfunction (ED), branding it as Viagra, which became one of the most well-known examples of successful drug repurposing.
Sildenafil works by inhibiting the enzyme phosphodiesterase type 5 (PDE5), which breaks down cyclic GMP in smooth muscle cells. By blocking this enzyme, it increases cyclic GMP levels, leading to smooth muscle relaxation and improved blood flow, thus helping men with erectile dysfunction.
Beyond ED, sildenafil has also been repurposed for treating pulmonary hypertension, marketed under the name Revatio, and its potential use in other vascular disorders continues to be explored. This makes it a prime example of how an existing drug can be used for entirely different therapeutic indications.
Moreover, sildenafil serves as a model in the drug repurposing field, demonstrating how computational tools and AI can identify new therapeutic uses for approved drugs. By leveraging AI to predict off-target effects and alternative indications, the drug repurposing process can be accelerated, reducing the time and cost needed to develop effective treatments.