Where Are We With Clinical Trial Tokenization?
By Dan Schell, Chief Editor, Clinical Leader

Back in January 2024, I interviewed Tom Dougherty about a topic many clinical research professionals had heard mentioned but few could confidently explain: tokenization. More than two years later, that article remains one of the most-read pieces ever published on Clinical Leader.
Why?
Well, I’d be guessing, but I think it’s because tokenization is one of those topics that seems to get more relevant every year. The concept touches everything from RWE and AI to patient privacy and long-term follow-up. It offers the possibility of learning more from clinical trial participants without asking more of them, but … and this is a big but … it also raises important questions about how data is used and protected. That's a combination that tends to make people perk up, especially during an ICF explanation.

- When we last spoke in 2024, tokenization still felt a bit like a “future state” capability. Where are we today in terms of actual adoption? Are sponsors truly operationalizing tokenization at scale, or are we still mostly in pilot mode?
Tom Dougherty: Today, I’d say adoption is real, but it’s still selective rather than universal. In the US, a lot of sponsors, academia, and CROs have moved beyond pure pilots, and tokenization is showing up in repeatable workflows especially where it directly supports external control arms, long-term follow-up, and other studies that depend on linking across data environments. That said, most organizations still aren’t treating tokenization as a capability that automatically applies to every trial, because true scale requires more than technology, it requires standardized governance, cross-functional operating models, integration, and validation. So I’m encouraged by the progress, and I’m also hoping to see more publications that go beyond proving it’s possible and really demonstrate outcomes and practical utility.
- Has tokenization delivered on its biggest promise — linking trial data to real-world data?
Not only has tokenization unlocked the opportunity to link clinical trials to RWD, the technology for data assessments has also evolved. Dataset selection traditionally relied on simple token overlap i.e., whether a trial participant is represented at all in a given dataset. This can mask whether the specific variables needed for analysis are available with sufficient completeness. Thru a partnership between AWS and Datavant, you can now utilize a cloud-first approach that enables secure, early, high-fidelity (i.e., more accurate) feasibility assessment to guide fit-for-purpose RWD selection, without ever moving underlying data. It is great to see the evolution in technology and also the growing number of data sources that you can connect to.
- Beyond privacy, consent, and interoperability, what are the biggest operational or regulatory bottlenecks right now? Is it governance, tech integration, or just lack of alignment across stakeholders?
Beyond privacy, consent, and interoperability, the biggest bottlenecks right now are usually operational and stakeholder alignment. Tokenization can’t be run in a silo because it touches legal and privacy interpretation, data governance, data management requirements, RWE needs, epidemiology input, and biostatistics expectations around what variables and completeness you need for the analysis. If those stakeholders aren’t aligned early, you feel it later in the workflow. That’s why I strongly recommend every organization builds a playbook, something like a living guide with roles, decision gates, and validation steps and then iterate that playbook as you learn. When you do that, tokenization becomes more plug-and-play, instead of turning into a barrier every time you start a new study.
- Is tokenization finally becoming essential for DCTs and hybrid trials, or is it still “nice to have”?
For DCTs and hybrid trials, I’d say tokenization is increasingly essential rather than just “nice to have.” In the US, healthcare is fragmented as patients move between settings and often change insurance. If your trial design depends on understanding the patient journey longitudinally, across real-world care environments, then you need a way to connect the dots across those settings. Tokenization supports that end-to-end traceability in a more holistic way.
- Are there specific use cases (long-term follow-up, external control arms, rare disease studies) where tokenization is clearly proving its value? And on the flip side, where has the industry been a little too optimistic?
Yes, there are use cases where tokenization is clearly proving its value, long-term follow-up and external control arms come to mind, and in some rare disease contexts where tracking patients longitudinally is challenging. Last year I presented a poster at AMCP where we did a literature review of clinical trials using tokenization. From the 20 publications that we identified, the work spanned interventional trials across phases phase II, III, and IV and the primary objectives often included safety and efficacy, as well as feasibility, cost, and survival. Oncology and cardiovascular were prominent, but the broader takeaway was that tokenization is providing value across multiple trial types and different objectives.
- Who actually owns tokenization strategy today? Is this sitting with data science, clinical operations, RWE teams, or external vendors?
Who owns tokenization strategy today is a great question, because in practice it depends on how the organization structures decision rights, but from my perspective it shouldn’t belong to just one function. Tokenization is not purely a data science capability, and it’s not only a clinical operations process either. It’s an operating model that affects study design, feasibility planning, governance, dataset onboarding, privacy/legal requirements, and ultimately analytics and evidence generation. Because of that, many organizations function best when they establish a cross-functional steering group or “center of excellence” where clinical/R&D and RWE together define the use cases, biostats and data management translate those use cases into requirements and standards, legal/privacy ensures governance and compliance are met, and data science/technology owns the technical approach and validation. External vendors and CROs can certainly own parts of execution like implementation, workflow configuration, and technical integration, but the sponsor should retain ownership of the strategy, the scientific intent, and the decision to approve and reuse the capability across studies. In short, vendors help deliver the capability, but the sponsor must own it as an end-to-end capability with shared responsibilities across functions.
Note: The presenter, although an employee of Novo Nordisk Inc., is presenting on their own behalf. The information provided and any views expressed during the presentation are those of the presenter and may not be attributed to Novo Nordisk Inc.