How AI Is Transforming Patient Stratification
By Daniel Gehrlach, Ph.D., director of biomarkers, HMNC Brain Health

When most people think about clinical trials, they imagine a large group of patients being tested to see if a new drug works better than a placebo. What often gets overlooked is how different patients within that group really are. In psychiatry, especially, two individuals with the same diagnosis of major depressive disorder (MDD) can look similar on paper but have very different biological reasons driving their symptoms. That diversity makes it hard to get clear answers in trials and it’s one of the main reasons so many psychiatric drugs fail to show consistent benefit.
In my work as the clinical project lead for a Phase 2 program targeting MDD, I’ve seen firsthand how AI can help us tackle this problem by reshaping how we design and run trials. Rather than assuming every patient with the same diagnosis is identical, we can rely on AI tools that help us stratify or group patients into subtypes that respond differently to treatment. This isn’t a distant promise anymore. It’s beginning to change how we design trials today.
Why Stratification Matters
Clinical trials have long struggled with patient heterogeneity. In oncology, this challenge has been addressed by using biomarkers to match treatments with specific tumor mutations. That’s why precision oncology has advanced so quickly. Psychiatry, however, has lagged because biological markers have been harder to identify.
The result? A trial for an antidepressant might enroll 300 patients, but only a subset of those patients actually responds. When you average everyone together, the benefit looks smaller than it really is. Patients who improve dramatically might get lost in the noise. Stratification tools aim to change that by identifying, up front, who is most likely to benefit from a therapy.
An Intimate Look At Stratification In Practice
A good example of this approach is HMNC Brain Health and Spruce Biosciences’ ongoing TAMARIND study, a Phase 2 trial testing tildacerfont for MDD. What makes TAMARIND different is that it doesn’t enroll just anyone with depression. Instead, there is a CRHR1CDx genetic test used at the beginning of the trial to identify people whose depression may be driven by problems in the body’s stress hormone system, also known as the hypothalamic-pituitary-adrenal (HPA) axis.
Only people who test positive for this specific biological profile are enrolled. That means the study is designed from the start to match the treatment to the patients who are most likely to benefit.
This is called prospective stratification, and it’s a big step forward for psychiatry. In the past, many trials failed because they enrolled broad groups of patients without knowing who might actually respond. TAMARIND turns that around, selecting patients based on their biology before treatment even begins. That makes the trial more efficient, more targeted, and potentially more successful
The Human Side Of Trial Design
These aren’t just statistical details. They affect patients directly. Imagine joining a clinical trial, investing your time, hopes, and energy, only to be randomized into a group where the treatment doesn’t match your biology. You leave the study discouraged and without benefit. Companion diagnostics and AI-based stratification reduce that risk by helping us run smarter trials, ones where patients are more likely to be matched with a drug that could help them or the wider population. Taking part in a trial means consciously accepting the uncertainty or receiving a placebo or an investigational drug for the greater good of society — being, in a sense, a willing research subject whose participation enables progress for others.
In TAMARIND, every participant has already been identified, genetically, as someone whose biology suggests a higher likelihood of response. That’s not just good science; it’s a more humane way to conduct research.
Lessons Learned Along The Way
Turning an AI algorithm into something that works in the messy reality of global trials is not simple. I’ve learned three important lessons from hands-on experience:
- Sample handling must be flawless. Genetic and biomarker data are only as good as the samples collected at the clinic. If one site mishandles blood draws, the entire model’s reliability is compromised. Examples of mishandling include not following the precise sequence of steps in the blood collection protocol (for example, for PAXgene RNA tubes), storing samples outside the required temperature range (such as freezing instead of cooling, or exposure to 40°C instead of the recommended 4°C), or using incorrect centrifugation settings when plasma separation is needed. Such deviations can lead to artificial signal changes unrelated to patient biology or treatment effects, introducing technical noise—commonly referred to as batch effects—that can confound both biomarker interpretation and AI model training.
- Models must generalize. A classifier trained on one population may not work the same way in another country or demographic. In the context of clinical trials, a classifer might be trained to distinguish between patients who are likely to respond to a drug and those who are not, using biological or digital features such as gene expression profiles or wearable-derived activity patterns. The challenge is that a classifier can only learn from the data it’s been shown. If it’s trained on one population—say, European patients recruited from large academic centers—but then used to predict outcomes in a more diverse population with different genetic backgrounds, diets, or comorbidities, its performance can drop sharply. It has, in a sense, “memorized” the training data instead of truly understanding the underlying biology. That’s why model generalization—ensuring the algorithm performs robustly across populations, sites, and technical conditions—is so critical.
- Regulatory input is critical. When you use an AI tool to inform patient stratification, regulators want to know it’s valid, transparent, and reproducible. Early and ongoing dialogue with regulators can make or break adoption of AI technology.
These may sound elementary to experienced clinicians, but they are the backbone of bringing AI from the lab into the clinic.
Beyond Psychiatry: Broader Applications
While my focus is psychiatry, stratification tools are making waves elsewhere, too. In oncology, AI is helping refine which patients benefit from immunotherapies. In immunology, stratification models are being tested to predict responses to biologics for autoimmune conditions. In neurology, combining imaging with digital biomarkers may help identify early responders in diseases like Parkinson’s or Alzheimer’s.
Across these fields, the same truth applies: the more precisely we can group patients, the clearer and more ethical our trials become.
From Genomics To Digital Signals
The toolkit for AI-enabled stratification is expanding rapidly. Genomics are the foundation today, but epigenetics, transcriptomics, proteomics or metabolomics are adding more nuance. Meanwhile, digital phenotyping, from devices ranging from smartphones to wearables, provides real-world, continuous measures of mood, sleep, or cognition. By themselves, these signals can’t predict treatment response (yet). But when layered on top of biological data, they give a richer, more dynamic picture of how patients are doing.
Challenges And Opportunities
As compelling as this progress is, many models are still in early stages of validation. False positives and false negatives happen, and underrepresented populations remain a major blind spot. Regulators are only beginning to outline standards for how AI-derived biomarkers should be used in trials. But the opportunity is real and urgent. For conditions like MDD, where response rates are low and trial failures are common, precision psychiatry offers a way forward. TAMARIND is part of that shift.
Another challenge is “explainability.” For doctors, patients, and regulators to trust AI, they need to understand why the algorithm is making a particular call. Black-box models may work in tech, but in medicine, transparency is nonnegotiable.
The starting point for our CDx is always the basic research foundation about the neurobiology of the stress axis and psychiatric disorders. Therefore, we already restricted the genes we considered to be part of the CDxs to those that we know are implicated in HPA function or depression.
However, we don´t really fully understand or can explain why the algorithm sees one gene as more important than another in making the call if the patient will respond or not. It has learned this way of prioritizing one over the other gene on the training dataset.
Now, when seeing a new patient, it tries to infer which group this new patient is more similar to — the ones that have previously responded or the ones that have not responded.
Further, there are methods like Shapely Additive exPlanations (SHAP) that try to extract explainability from neural networks, which we are exploring. Of course, we would love to fully understand the algorithm, but ultimately it matters that these tools make the right decision.
Despite the challenges, what excites me most is watching trials evolve from asking a single blunt question: “Does this drug work?” to asking a smarter one: “Who does this drug work for, and why?” That’s the power of stratification. It acknowledges the diversity within patient populations and uses data to turn that complexity into clarity.
For the patients I meet and the colleagues I work alongside, this isn’t abstract. It’s the difference between wasted time and meaningful progress. It’s the difference between average results and personalized hope.
Looking Ahead
AI won’t replace the fundamentals of good trial design, but it does give us sharper tools to design trials that are smaller, faster, and more humane. Instead of forcing patients into one-size-fits-all experiments, we can start to tailor enrollment and analysis around real biological differences. That means fewer failed trials, clearer answers, and, most importantly, better options for patients waiting for new treatments. For me, that’s the most important part: seeing AI help us design trials that are not only scientifically stronger but also more aligned with the human lives behind the data.
About The Author:
Daniel Gehrlach, Ph.D., is the associate director of biomarkers at HMNC Brain Health. After obtaining degrees in pharmaceutical biotechnology in Biberach and medical neurosciences at the Charité in Berlin, he completed his Ph.D. at the Max Planck Institute of Neurobiology in Martinsried, Germany. He gained valuable experience in preclinical and translational in vivo models of affective disorders, in particular in the fields of anxiety, depression, and addiction. Daniel was awarded the Max Planck Institute's Young Scientist Award 2019 for his doctoral project, which was also published in Nature Neuroscience.
As associate director at HMNC Brain Health, he is the project lead for a Phase 2 clinical trial. In addition, he is operationally responsible for the co-development of HMNC Brain Health’s companion diagnostics. Other responsibilities include regulatory affairs (IVDR), CRO management, and CMC oversight. Furthermore, he is involved in HMNC’s business development and intellectual property activities.