Guest Column | April 26, 2022

Can AI/ML Help Solve Underrepresentation In Clinical Trials?

By Tim Sandle, Ph.D.

Expert NetworkArtificial intelligence (AI) is becoming more commonplace in clinical trials, helping to advance drug development. Forms of AI include natural language processing, graph vectorization, and supervised/unsupervised learning techniques on clinical data to improve and accelerate critical steps in the clinical trials process. For example, AI is used to scan through millions of data points, drawn from published research papers, to predict the absorption, distribution, metabolism, excretion, and toxicity of new drug candidates and hence to find new compounds that can be used in new formulations of medicines.1 AI additionally carries the potential to speed up drug discovery, from what is traditionally several years to several days.

Machine learning (ML) enables powerful inferences to be made from clinical trial data sets and, through learning, the extent of this power increases over time. The more data that is input and the more often errors are corrected, the more reliable the algorithm for future performance. Greater predictive accuracy arises from inputting with heterogeneous data (including suitably representing biological sex). Hence, failing to use representative data or beginning with in-built biases in AI programming becomes a limitation on the seemingly expanding capacity of AI to deliver improved clinical trial outcomes.

To assess medicines for safety and efficacy, it is important that clinical trials are representative of biological sex. Too often, there is an underrepresentation of women in clinical trial subject populations. This leads to erroneous data since women differ in presentation, clinical manifestations, and outcomes in comparison to men.2 For example, many medications metabolize more slowly in women and other medications are more active in men. Clinical manifestations may differ because of the unknown pathophysiology for many disorders, and the drug target identification can differ between the sexes for the same disease. Consequently, since details about clinical trial participants often remains underreported in medical journals, the physician deciding on the optimal treatment for a female patient may not be choosing the most appropriate intervention. Alternatively, the side effects of a medication can be different between men and women.3

Where these biases occur, they undermine the potential that AI presents to translate, scale, and accelerate drug development insights. In addition, if the base algorithm or the training data for an ML model contains some form of bias, it is extremely likely that the resulting models will perpetuate that bias when recommending subject data sets and for interpreting trial data sets.

Using AI To Produce Representative Trial Outcomes

While women remain underrepresented in too many clinical trials or trials are incorrectly assessed when accounting for female patients, AI presents solutions to these design challenges. For example:

  • AI can aid clinical trial directors by selecting representative subjects for inclusion in trials. The right algorithms can improve subject enrollment in clinical trials and lead to the creation of more efficacious medicines.4
  • Where trials have been appropriately balanced, AI can aid in the assessment of broad data sets and extract the variables that might indicate different responses to the medication by men and women. This includes drawing correlations and multivariate analysis between molecular representations of the medicine and biological and toxicological responses with study participants.
  • During the course of clinical trials, AI can help with monitoring data and ensure female subjects are not adversely affected as the trial proceeds. This includes providing insights into any increases to the length of hospitalization, rehospitalization when it is not expected, and any correlation between unexpected diseases that might be affecting the female cohort.
  • Many clinical trials follow on from earlier cases, for reasons such as improving medications or seeking to repurpose medications to treat different conditions. ML can overcome biases that exist in historical clinical studies by deciphering and identifying differences in adverse drug effects between men and women, even where historic studies were disproportionately structured toward male participants. Algorithms have been developed that can automatically correct biases in trials that contained an overrepresentation of male subjects. The importance of this may lead to different prescribing recommendations for men (such as stronger dose) and women (such as a lower dose).5
  • There are times when bias is necessary in clinical trial design. Where specific medicines are required for women (such as with medicines to treat breast cancer), AI assists with the optimization of the care trajectory of chronic disease patients and with evolving precision therapies for complex illnesses.
  • For drugs released into the market, AI can be fed with historical data where sex differences were reported in trials and with market surveillance data from good pharmacovigilance practice such as reports of adverse drug effects from consumers, healthcare providers, and pharmaceutical manufacturers.6 This can help alert manufacturers and regulators to an evolving concern as well as provide a data stream for subsequent clinical trials for the next generation of the medicine.

AI-based algorithms are additionally being developed to efficiently probe the pathways of synthesis for novel drug candidates (where there are between-sex differences in terms of genomics information, biochemical attributes, and target tractability exist). To aid researchers, AI can be programmed to respond to verbal questions and requests in the laboratory setting.7 Such processes can help researchers to understand bias and the risks that stem from this and to simultaneously teach the AI to disentangle male-female drug interaction differences.

AI has the power to ensure clinical trials are unbiased from the outset and draw better inferences about the different responses between men and women from clinical trials. This presupposes that the AI is not in itself biased as a result of its programming or learning, and developers need to be mindful of this possibility.


  1. Duka, J., Jansen, H., Schneider, P. et al (2020) Rethinking drug design in the artificial intelligence era. Nature Reviews Drug Discovery. ISSN 14741784
  2. Vitalea, FiniaIlaria, M. Mitja, S. et al.  (2017) Under-representation of elderly and women in clinical trials, International Journal of Cardiology, 232: 216-221
  3. Chandak, P. and Tatonetti, N.P. (2020) Using Machine Learning to Identify Adverse Drug Effects Posing Increased Risk to Women. Patterns, 100108. DOI: 10.1016/j.patter.2020.100108
  4. Espinoza, J. L. (2018) Machine learning for tackling microbiota data and infection complications in immunocompromised patients with cancer. Journal of Internal Medicine, DOI: 10.1111/joim.12746
  5. Agmon, S., Gillis, P., Horvitz, E. et al. (2022) Gender-sensitive word embeddings for healthcare, Journal of the American Medical Informatics Association, 29 (3): 415–423
  6. Criado, P.C. (2019) Invisible women. New York: Harry N. Abrams
  7. Koohy H. (2017) Dataset 1 in: The rise and fall of machine learning methods in biomedical research. F1000Research:

About The Author:

Tim Sandle, Ph.D., is a pharmaceutical professional with wide experience in microbiology and quality assurance. He is the author of more than 30 books relating to pharmaceuticals, healthcare, and life sciences, as well as over 170 peer-reviewed papers and some 500 technical articles. Sandle has presented at over 200 events and he currently works at Bio Products Laboratory Ltd. (BPL), and he is a visiting professor at the University of Manchester and University College London, as well as a consultant to the pharmaceutical industry. Visit his microbiology website at