Digital Twins in Pivotal Trials: Are We Ready to Replace Placebo Arms?

For decades, the placebo-controlled randomized trial was the unchallenged gold standard of clinical evidence. Its logic was elegant: compare a drug against nothing, blind everyone to assignment, and let the data speak. That framework produced the modern pharmaceutical era. In certain contexts, it has also introduced practical and ethical constraints that are increasingly difficult to ignore. Today, AI-generated digital twins are forcing the field to ask a question that would have seemed radical a decade ago: in how many pivotal programs can the conventional placebo arm be responsibly reconsidered?

The answer, in selected contexts, is beginning to shift. Not universally, but in specific situations: when the disease is devastating, the population is small, the historical data are rich, and the regulatory pathway has been formally established. Digital twin approaches are beginning to influence how study designs are evaluated and optimized, and the organizations that understand how to deploy them responsibly may find themselves better prepared for the regulatory and methodological questions that lie ahead.

What Are Digital Twins in Clinical Trials?

The phrase "digital twin" carries different meanings depending on the industry. In clinical development, the concept is specific and technically grounded. A digital twin is a computational model built from structured data collected in prior clinical studies. For each participant enrolled in a new trial, the model constructs a personalized forecast of how that patient's disease would likely progress under standard of care alone, without the investigational agent. That forecast serves as the comparator observation, reducing or potentially replacing the need for a concurrently randomized placebo group.

What distinguishes this from older approaches, such as literature-based historical controls or propensity-matched registry cohorts, is the granularity. Contemporary digital twin platforms build patient-specific covariate adjustment scores that absorb much of the unexplained variability in the primary endpoint. By tightening the noise around each individual's expected trajectory, these models allow a trial to detect treatment effects with fewer enrolled participants while preserving the false-positive rate that regulators require. What these definitions typically omit is the operational complexity: ensuring that baseline data collection is standardized enough to feed the model, that site-level variability does not introduce noise the twin cannot absorb, and that the statistical analysis plan integrates twin-derived covariates in a way that every member of the review division team can evaluate.

The practical consequence for trial design is meaningful. Published validations suggest that well-calibrated digital twin approaches can reduce required enrollment by roughly 10 to 30 percent in certain therapeutic contexts, though the exact benefit depends on endpoint characteristics, model quality, and regulatory division acceptance. Those reductions can contribute to shorter timelines and lower per-patient costs, though the magnitude varies with the underlying model and the regulatory division's receptivity.

Regulatory Momentum Is Emerging

The most important development in this space is not technological. It is regulatory. And it has been building for several years with increasing clarity.

In September 2022, the European Medicines Agency issued a formal qualification opinion for PROCOVA, Unlearn's AI-powered covariate adjustment procedure. This was the first time a regulatory body formally supported a machine learning method for reducing sample size in pivotal trials. The EMA stated that PROCOVA "qualifies" and that the procedures "could enable increases in power and/or decreases in sample size in phase 2 and 3 clinical trials with continuous outcomes." The EMA is not endorsing digital twins as a curiosity for exploratory endpoints. It is qualifying them for pivotal-stage inference.

The FDA's position is also more permissive than many assume. The agency confirmed that PROCOVA is a special case of ANCOVA, a method well within established guidance. In January 2025, FDA issued draft guidance establishing a seven-step credibility assessment framework for AI-generated evidence. This framework provides increasing clarity on how such approaches may be evaluated.

Then, on January 14, 2026, FDA and EMA jointly published "Guiding Principles of Good AI Practice in Drug Development," articulating ten principles for AI use across the drug product lifecycle. The joint publication suggests increasing alignment in regulatory thinking across agencies that clinical development teams should study carefully before their next IND meeting.

The EMA is not endorsing digital twins as a curiosity for exploratory endpoints. It is qualifying them for pivotal-stage inference.

The use of external and synthetic control arms is not theoretical. FDA has approved multiple products relying substantially on real-world comparator data, including Nulibry (2021) for molybdenum cofactor deficiency, Voxzogo (2021) for achondroplasia, SKYSONA (2022) for cerebral adrenoleukodystrophy, and LENMELDY (2024) for metachromatic leukodystrophy. These are approved products with full labeling, several in therapeutic areas where placebo exposure would have been ethically indefensible. It is worth noting that these approvals relied on structured natural history datasets and registry-based external controls rather than AI-generated digital twin models. The regulatory precedent they establish is directional, not identical, to the digital twin use case, and that distinction will matter in how review divisions evaluate future submissions.

From Theory to Practice: Who Is Already Doing This?

The organizations leading this shift are not small biotechs running speculative trials. They are large pharmaceutical companies making deliberate choices to incorporate digital twin methodology into selected pipeline decisions.

Sanofi has used digital patient twins to move directly from Phase 1B to Phase 2B, skipping Phase 2A entirely on two key indications: asthma and Pompe disease. The decision to bypass a full Phase 2A represents a meaningful investment in the methodology's maturity, and signals organizational confidence in the underlying data infrastructure.

Unlearn, whose PROCOVA method received EMA qualification, has demonstrated that enrollment requirements can be reduced by roughly 10 to 30 percent in neurological and cardiovascular programs, while maintaining the statistical safeguards that regulators expect for false-positive rates. The company has a multi-year collaboration with Merck KGaA deploying TwinRCTs across its immunology pipeline.

Roche used a synthetic control arm of 67 patients to provide comparative effectiveness evidence for Alecensa in ALK-positive non-small cell lung cancer, advancing European coverage by 18 months. Amgen and Janssen have similarly used synthetic control data for Blincyto and Balversa, respectively.

These examples illustrate early directional adoption rather than broad standardization. Each involved indications with unusually well-curated historical datasets, regulatory divisions with prior experience evaluating external comparators, and development teams that engaged the agency on methodology before protocol finalization. Most programs in most therapeutic areas do not yet meet all three conditions. But where they do converge, synthetic and digital control methodologies are viable pathways to approval.

The Ethical Imperative

If synthetic control arms can provide evidence of sufficient rigor to support the same regulatory decisions, the evidentiary expectations are beginning to evolve. The question is no longer whether digital twins are good enough. It is whether requiring placebo assignment can still be affirmatively justified.

Fear of placebo assignment is one of the most frequently cited reasons patients decline trial participation. This is not a minor friction point in enrollment logistics. It is a structural barrier that systematically underrepresents patients who most need new therapies, particularly those with progressive, life-limiting conditions where months on placebo carry real clinical consequences.

Advocacy organizations in rare disease, oncology, and neurodegeneration have increasingly called for study designs that minimize unnecessary exposure to inactive comparators. When a disease causes irreversible neurological damage over months, and the study drug has shown compelling Phase 2 signals, assigning a patient to placebo is not a neutral act. It is a decision that increasingly invites closer ethical scrutiny, and that justification has eroded significantly as the evidentiary quality of AI-generated comparators has improved.

This argument is especially compelling in rare and pediatric diseases. Small patient populations, geographic dispersion, and the absence of viable alternatives make traditional 1:1 randomization both scientifically inefficient and ethically fraught. I see this directly in our work supporting the development of a mesenchymal stem cell therapy for neonatal intraventricular hemorrhage and hypoxic-ischemic encephalopathy. Conventional randomization in this context requires ethical justification that is increasingly difficult to sustain when structured neonatal outcome registries exist and digital twin methodologies are maturing.

The same logic applies to neurodegenerative conditions like Alzheimer's disease, where our work on early-stage trial design for novel fusion protein therapeutics brings these considerations into sharp focus, and where the data infrastructure from longitudinal registries is arguably the richest in medicine. Rare disease, oncology, and neurodegeneration are the three therapeutic areas where the case for digital twin control arms is currently strongest.

What Must Be True for This to Work

None of this means placebo arms are obsolete. There are conditions under which they remain not only appropriate but necessary. Being clear-eyed about those conditions is what separates responsible adoption from regulatory overreach.

Data quality is the foundational constraint. Digital twins are only as accurate as the historical data on which they are trained. Retrospective datasets with missing values, inconsistent endpoint definitions, or unrepresentative demographics produce twin models that will not survive regulatory scrutiny. The FDA's seven-step credibility assessment is explicit on this: the model development process must be documented with the same rigor as a validated biomarker assay, including pre-specified validation against held-out prospective data.

Bias risk deserves direct acknowledgment. Historical datasets in most therapeutic areas underrepresent women, minority populations, and patients from non-Western healthcare systems. A digital twin trained on these datasets will propagate that underrepresentation into the synthetic control arm. Development teams must demonstrate that their twin models perform comparably across demographic subgroups, or regulators will raise legitimate questions about the generalizability of the comparator.

Temporal drift in training data is a concern that warrants more attention than it currently receives. Digital twins built on historical trial data from 2015 to 2020 may not account for meaningful shifts in standard of care that have occurred since those trials were conducted. New drug approvals, updated treatment guidelines, and evolving diagnostic criteria can systematically alter the expected trajectory of an untreated control arm. Consider Alzheimer's disease: a synthetic control modeled on pre-2020 trial data may not reflect the impact of anti-amyloid therapies now available as background treatment, creating a risk of underestimating control arm performance and inflating the apparent treatment effect. This is the kind of concern that regulatory statisticians will raise, and development teams should address it proactively by demonstrating that their training data reflects the current treatment landscape, not merely the historical record. Having designed trials across shifting treatment paradigms in oncology, infectious disease, and CNS, I can attest that this temporal dimension is frequently underappreciated until it surfaces in a regulatory interaction.

Therapeutic area fit matters. The most amenable conditions share several characteristics: continuous, quantifiable primary endpoints; well-characterized natural disease progression; existing structured longitudinal datasets; and regulatory precedent for external comparators. Conditions without these features, including many immune-mediated diseases with poorly predicted flare patterns, remain poor candidates for placebo arm replacement.

None of this means placebo arms are obsolete. Being clear-eyed about the conditions under which they remain necessary is what separates responsible adoption from regulatory overreach.

Infrastructure is the limiting factor for most organizations. Deploying digital twin methodology in a pivotal trial is not a matter of contracting with a platform vendor and adjusting the statistical analysis plan. It requires interoperable trial intelligence infrastructure, clinical data standards that enable meaningful model training, and a regulatory affairs function prepared to engage review divisions on the specific methodological questions that arise in a pre-IND or Type B meeting.

Looking Forward: What This Means for Clinical Development

Clinical development teams that are unable to integrate digital twin methodologies into study design and regulatory strategy may face increasing competitive pressure. This is moving beyond a purely exploratory concept, becoming a strategic capability that distinguishes organizations investing in next-generation trial infrastructure from those that have not yet evaluated whether these methodologies apply to their portfolios.

In our own programs at Amarex, we work across oncology, infectious disease, CNS, rare disease, and cardiovascular indications. Across these areas, we are seeing a consistent pattern: development teams are evaluating whether digital twin approaches can strengthen their regulatory packages, not as a cost-reduction exercise, but as a methodological decision with downstream implications for filing strategy and payer evidence. These are questions that would have been premature even two years ago. For most programs, the answer today is still that conventional randomization remains the appropriate design. The value of engaging with digital twin methodology now lies in identifying the specific programs where the evidence, the infrastructure, and the regulatory pathway converge.

Our involvement in the NSF/Microsoft Azure AI Acceleration initiative has given us direct visibility into the infrastructure requirements behind these questions. The data architecture, model governance, and audit trail demands for a regulatory-grade digital twin submission are significant, requiring investment in systems that most organizations have not yet built.

That is changing. The scientific literature on digital twins in clinical settings is maturing rapidly, with peer-reviewed validation studies providing the evidentiary foundation that regulatory agencies need before accepting novel methodologies at the pivotal stage. CDER alone received more than 500 submissions with AI components between 2016 and 2023, with volume increasing sharply since. The agency is building its own internal capacity to evaluate these submissions, which means the quality bar for AI evidence is rising in parallel.

The organizations that will lead in this environment are those investing now in three capabilities: structured data infrastructure that makes digital twin model training feasible; regulatory literacy that extends to direct division-level engagement on AI methodologies; and the organizational confidence to present novel study designs with the depth of technical preparation that these interactions demand.

The Question Has Already Shifted

The central question of this piece was whether we are ready to replace placebo arms with digital twins in pivotal trials. In certain contexts, we already have. The regulatory framework exists, the precedent cases are documented, and the methodology has advanced far enough to support meaningful roles in primary analysis under the right conditions. No pivotal approval to date has relied on a digital twin as the sole comparator, and review divisions will require substantial additional validation before that threshold is crossed.

The more important question is whether clinical development organizations have the data infrastructure, regulatory literacy, and organizational will to deploy these methods responsibly. The placebo arm is not going away tomorrow. Randomized controlled trials will remain the evidentiary standard for most indications and most regulatory environments for the foreseeable future.

The role of the placebo arm in evidence generation is beginning to evolve. The organizations that recognize that shift now, and that build the methodological and regulatory capabilities to execute on it, may improve trial efficiency, enrollment dynamics, and overall development timelines. That is not a research aspiration. It is emerging as an operational consideration in clinical development.

Kush Dhody, MD, MS
Physician-Scientist | Clinical Development Executive | Regulatory Strategy Advisor

Disclaimer: The perspectives presented reflect emerging developments in clinical trial methodology and regulatory science. Adoption of digital twin approaches should be evaluated in the context of specific therapeutic areas, data availability, and regulatory engagement.