Can AI actually help sites?
Most clinical trial sites still capture protocol data on paper. On July 1 at 1
How regulatory clarity, AI-assisted curation, patient-connected data, and the EU MDR transition are changing clinical evidence generation.
Real-world evidence has moved from a regulatory side program to a core evidence-generation discipline. FDA finalized its December 2025 device RWE guidance, EMA’s DARWIN EU network reached 40 data partners by early 2026, and ICH M14 entered implementation at EMA in March 2026. But the shift is not that regulators are accepting more data sources. It is that they are accepting data only when its provenance, transformations, missingness, and review workflow are inspectable. BioPharma and MedTech teams face structurally different evidence challenges and need different architectures. The 2027 question is not which dataset to buy. It is which evidence engine can be defended under inspection.
Real-world evidence moved from a regulatory side program to a core evidence-generation discipline because several regulatory and market signals arrived at once.
For medical devices, FDA finalized its December 2025 guidance on using real-world evidence to support regulatory decision-making. The important nuance is not that participant-level data no longer matters. The guidance says that a sponsor’s inability to obtain identifiable, participant-level data from some real-world data sources does not, by itself, preclude FDA evaluation. In exchange, sponsors must be able to document how the data were generated, controlled, transformed, and traced back to the original source or source system.[1]
For drugs and biologics, the center of gravity remains different. FDA’s drug and biologic RWE program emphasizes fit-for-purpose real-world data, traceability, prespecified study design, and defensible methods. The Prograf lung-transplant indication remains a useful example because FDA relied on a non-interventional study using data from the Scientific Registry of Transplant Recipients, supported by confirmatory evidence from prior clinical trials.[2] Externally controlled trials are also real, but FDA draft guidance frames them as design-dependent and context-specific, not as a general replacement for randomized controls.[3]
In Europe, EMA’s DARWIN EUEMA-coordinated Data Analysis and Real World Interrogation Network for regulator-requested real-world evidence studies in Europe. network has matured into a practical regulatory infrastructure. EMA describes DARWIN EU as a network that supports regulatory decision-making with validated real-world data, non-interventional studies, local analyses, and a common data model. It reached 40 data partners by early 2026, up from 32 at end of 2025, and standardizes data into the OMOP CDMObservational Medical Outcomes Partnership Common Data Model, an open community standard for observational health data analysis. for faster analysis.[4] At the same time, ICH M14International Council for Harmonisation guideline on pharmacoepidemiological studies using RWD for medicines safety assessment. reached Step 4 of the ICH process in September 2025 and entered implementation (Step 5) at EMA on 18 March 2026, creating the first harmonized international guideline for planning, designing, analyzing, and reporting pharmacoepidemiological studies that use fit-for-use real-world data for medicines safety assessment.[5]
AI also moved from pilots to governed use. FDA and EMA released joint principles for AI in drug and biological development in January 2026. The principles do not bless unsupervised automation; they emphasize human-centric design, risk-based development, data governance, documentation, performance assessment, and lifecycle management.[6] In parallel, ISPE’s GAMP 5 Second Edition and its AI guidance give life-sciences teams a more concrete language for validating AI-enabled computerized systems in GxP settings.[7]
For MedTech, EU MDR remains the deadline forcing function. Regulation (EU) 2023/607 extended transition timelines for many legacy devices, but the extension is conditional. Class III and implantable Class IIb legacy devices generally face the 31 December 2027 transition deadline, while other eligible Class IIb, Class IIa, and certain Class I devices generally face 31 December 2028. Manufacturers must preserve the conditions of the extension, including no significant changes in device design or intended purpose.[8]
They are accepting broader data sources only when the provenance, transformations, missingness, and review workflow are inspectable.
The differentiator is no longer access to one retrospective dataset. It is the ability to run a governed, patient-connected evidence engine across modalities.
FDA’s 2025 report lists 73 examples of medical-device marketing authorizations that used RWEReal-World Evidence, evidence about the usage and potential benefits or risks of a product derived from analysis of real-world data collected outside traditional randomized controlled trials. from FY2020 through FY2025, building on 90 examples documented from FY2021 through FY2025. FDA approves new use of transplant drug based on real-world evidence is a reference point from July 2021.[2] DARWIN EU shows the European version of the same trend: federated, standardized, regulator-requested RWE studies.[4]
Flatiron’s VALID framework showed that large language models can extract real-world progression endpoints across 14 cancer types with F1 performance similar to expert human abstractors in that use case, while producing nearly identical real-world progression-free survival estimates.[10] This is not a license for unsupervised LLM extraction. It is evidence that model output can become credible when the context of use is narrow and the validation framework is strong.
Rare-disease foundations have become more than advocacy organizations. The Cystic Fibrosis Foundation Patient Registry collects longitudinal health-status data from consenting patients at accredited care centers and is used for care guidelines, quality improvement, research, treatment/outcomes assessment, and clinical-trial design.[11] Parent Project Muscular Dystrophy’s Duchenne Outcomes Research Interchange combines registry, EHR, industry-partner, patient-reported, and clinician-reported data into a central warehouse for RWE generation.[12] Patient-connected architectures from decentralized clinical trials are applying this same model beyond rare disease, enabling continuous real-world data streams outside of traditional study infrastructure.
EU MDR and MDCG guidance turn post-market surveillance into a continuous system rather than an annual documentation exercise. MDCG 2025-10 was published in December 2025 to clarify post-market surveillance expectations for medical devices and IVDs.[13]
Drug and device requirements are two sides of the same regulatory coin: both demand defensible real-world evidence, but each operates under different evidentiary standards and distinct operational constraints. Understanding where they differ is what makes execution on each track effective.
Drug and biologic teams typically need participant-level traceability, defensible outcome definitions, prespecified analysis, control for confounding, and a strong rationale for why a real-world comparator is fit for purpose. The question is not simply, “Do we have enough records?” It is, “Can we reconstruct why this patient qualified, what this endpoint was, what was observed, and how the comparator was treated differently from the treated population?”
External control arms are strongest when the disease is rare or severe, the outcome is objective and reliably measured, the natural history is well-characterized, randomization is infeasible or ethically difficult, and outcome ascertainment is not reliably present in routine care. FDA’s external-control draft guidance explicitly describes both historical and concurrent external control arms as possible designs, but leaves the evidentiary burden on the sponsor.[3]
This is why patient-governed data trusts matter. In rare diseases, the best natural-history comparators may sit with foundations, registries, and long-running patient communities. The economic model looks less like buying a commodity dataset and more like venture philanthropy or data stewardship: foundations shape endpoints, trial readiness, access governance, and sometimes the commercial terms under which industry uses the data. The Cystic Fibrosis Foundation’s investment history with Vertex is the archetypal example of venture philanthropy changing the development economics of a disease area.[14]
Do three audits before committing to an RWE strategy: (1) endpoint reconstructability, (2) comparator exchangeability, and (3) data-rights governance. A dataset that cannot reconstruct the endpoint or defend the comparator is not an evidence asset; it is a hypothesis-generation asset.
For devices, the operational question is different. The 2025 FDA device guidance makes de-identified or aggregate outputs potentially usable, but only if sponsors can explain data provenance, data transformations, linkage quality, access limitations, and the implications of not having participant-level records.[1] The rule of thumb is traceability over identity: sponsor access to identity may not be required in every case, but the evidence chain must remain inspectable.
In Europe, PMCFPost-Market Clinical Follow-up, a continuous process under EU MDR to update clinical evaluation with data from device use. is no longer a static appendix. Under MDR, post-market surveillance and PMCF must be planned, maintained, and fed by appropriate clinical data across the product lifecycle. The 2023/607 transition extension gives manufacturers time, but not a free pass. A significant change to design or intended purpose can undermine eligibility for the transition extension.[8]
AI-enabled device evidence adds another layer. MDCG 2025-6 explains that the AI Act and MDR/IVDR can apply simultaneously and complementarily to medical device AI. A device can be considered high-risk under Article 6(1) of the AI Act when the AI system is itself a medical device or a safety component. The guidance encourages manufacturers to integrate AI Act testing, reporting, and documentation into existing MDR/IVDR procedures where appropriate.[15]
Segment the portfolio by evidence risk. For each legacy device, identify the certificate deadline, significant-change exposure, current PMCF evidence maturity, Notified Body feedback, and whether any AI-enabled component creates an additional AI Act documentation path.
The next generation of RWE capability is not a database. It is an operating model with three design decisions: how identity is resolved, how data are standardized, and how automated work is validated.
TokenizationPrivacy-preserving record linkage; conversion of identifying attributes into tokens for linkage without direct disclosure of identity. uses privacy-preserving record linkage: personally identifying attributes are transformed into tokens so records can be linked across claims, EHR, lab, mortality, and clinical study datasets. But linkage quality is not universal. It depends on the identifiers available, their completeness, the token strategy, the time overlap between linked records, and whether patients move, change names, switch insurers, or appear differently across systems.
A 2025 systematic review of patient-tokenization performance found that precision is often high, but recall varies materially depending on the token set; examples ranged from very low recall for some single-token strategies (where matching relies on a single identifier) to much stronger recall with richer token sets (matching on combinations of identifiers such as name, date of birth, geographic code, and insurance identifier), analogous to the difference between single-attribute deterministic matching and multi-attribute probabilistic approaches.[16] The practical lesson: do not quote a generic match rate. Require a linkage feasibility report that shows precision, recall or estimated linkage rate, performance by key subgroup, time-window overlap, duplicate handling, and bias between linked and unlinked patients.
Direct-to-patient consentPatient-authorized retrieval or contribution of personal health data, distinct from tokenized retrospective linkage. solves a different problem. In the U.S., HIPAA §164.524 gives individuals a right to access designated-record-set information about themselves, subject to exceptions.[17] In Europe, GDPR Article 20 creates a right to receive personal data in a structured, commonly used, machine-readable format and transmit it to another controller when the legal conditions are met.[18] DTP pipelines can create deterministic, patient-authorized access to longitudinal data, but they introduce consent-funnel drop-off, site workflow friction, privacy review, and cross-border governance complexity.
Claims data add yet another trade-off. Open claims can be near-real-time but may lag in the 60-to-90-day range or longer, and closed claims can lose longitudinality when patients switch plans.[19] For acute safety monitoring or PMCF event capture, retrospective claims collection alone should not be the only sensing layer.
| Evidence sourcing approach | Best use | Main risk | What to document |
|---|---|---|---|
| Tokenized retrospective linkage | Claims/EHR/lab/mortality linkage at scale | False non-matches, duplicate patients, subgroup linkage bias | Token logic, linkage metrics, bias analysis, overlap windows |
| Direct-to-patient authorized retrieval | Longitudinal, patient-connected evidence and study recontact | Consent drop-off, portal coverage gaps, privacy/governance burden | Consent funnel, authorization language, source systems, refresh cadence |
| ePRO/eCOA | Symptoms, functional outcomes, tolerability, device experience | Reporting bias, missingness, device access inequity | Instrument validation, reminders, missing-data plan, accessibility plan |
| Federated EHR network | Large-scale feasibility and regulatory RWE questions | Local coding heterogeneity and limited source visibility | Common data model version, quality checks, phenotype logic |
European federated networks and observational analytics increasingly operate through the OMOP CDMObservational Medical Outcomes Partnership Common Data Model, an open community standard for organizing and formatting study data for review and reuse.. EMA states that DARWIN EU data partners standardize data into OMOP so it can be analyzed faster.[4] OHDSI describes OMOP CDM as an open community standard designed to standardize observational health data into a common format so it can be analyzed efficiently.[20]
Regulatory submissions often require a different standardization layer. CDISC SDTMClinical Data Interchange Standards Consortium Study Data Tabulation Model, the regulatory submission standard for organizing and formatting study data. and related CDISC standards are the review substrate for many regulatory submissions in specified contexts.[22] The architectural insight is that mature RWE organizations should not treat OMOP and CDISC as competing religions. The practical capability is dual mapping: local EHR codes and source variables map into OMOP for network analytics and into CDISC structures for submission-readiness, with transformation logic and lineage preserved.
Define source-to-OMOP and source-to-SDTM mappings at study design, not at submission cleanup. The audit trail should show source field, vocabulary mapping, transformation rule, reviewer, timestamp, and version.
Flatiron’s oncology example shows what is real: narrow context of use, expert-labeled endpoints, validation metrics, benchmark analyses, and a framework for repeated quality assessment.[10] It does not show that a general-purpose LLM can freely read charts and create submission-ready data.
For GxP use, the standard should be closer to controlled computerized-system validation than to software experimentation. GAMP 5 Second Edition emphasizes risk-based, fit-for-intended-use computerized systems, service-provider oversight, automation, critical thinking, and data integrity. ISPE’s AI guide explicitly frames AI-enabled computerized systems around patient safety, product quality, data integrity, monitoring, and maintenance.[7] FDA’s good documentation training defines documentation as legible, traceable, and reproducible, and summarizes ALCOAAttributable, Legible, Contemporaneous, Original, Accurate — the core data-integrity framework used in GxP documentation. attributes for raw data: attributable, legible, contemporaneous, original, and accurate.[23]
In operational terms, every AI-extracted data point should be treated like a proposed data transformation, not a final fact. The system should retain the source snippet or source-document pointer, model/version, prompt or extraction configuration where applicable, confidence or review flag, human reviewer decision, timestamp, and reason for override. That is the difference between AI-assisted curation and AI-generated evidence.
Language matters. A historical ECAExternal Control Arm — a comparator group external to a clinical trial, using either historical or concurrent real-world patients who did not receive the treatment under study. is an epidemiological comparator derived from observed patient histories. A generative “digital twin” is a synthetic profile or trajectory produced by a model. They are not the same regulatory concept.
FDA’s external-control draft guidance recognizes that an external control arm may be historical or concurrent and may compare trial participants to people outside the trial who did not receive the same treatment.[3] That is a design and causal-inference problem. A broad claim that generative digital twins can replace randomized controls in common chronic diseases is a much higher bar because model bias, missingness, measurement heterogeneity, and unmeasured confounding can be amplified.
OS Therapies’ OST-HER2 pulmonary metastatic osteosarcoma program is useful only as a bounded example. The company reported 75% two-year overall survival in treated evaluable patients compared with 40% in a historical control group in a non-randomized Phase 2b setting, with overall survival as a secondary endpoint.[24] This data was reported in a company press release ahead of peer-reviewed publication. That illustrates the external-comparator logic in a rare, high-unmet-need context; it should not be generalized into a claim that generative AI controls are broadly acceptable.
Interoperability is improving, but relying on major health-system APIs alone can create a generalizability problem. A 2025 AHA survey reported by ASTP/ONC found that participation in TEFCATrusted Exchange Framework and Common Agreement — the U.S. framework for nationwide health information exchange among Qualified Health Information Networks. was uneven: 51% among multi-hospital system members, 25% among independent hospitals, and 30% among critical access hospitals. In aggregate, 43% of non-federal acute-care hospitals were aware of TEFCA and currently participating.[25]
This matters because FDA’s June 2024 diversity action plan draft guidance describes required diversity action plans for certain drugs, biologics, and devices.[26] A study architecture that oversamples large academic or integrated delivery systems can look technically modern while under-representing rural, community, and lower-resource settings. The fix is not to abandon EHR integration. Supplement it with community-site workflows, claims, registries, patient-mediated data access, and direct data collection from patients using ePRO/eCOA tools, which capture outcomes from participants regardless of whether their health system has full API coverage.
RWDReal-World Data, data relating to patient health status and care delivery collected from sources such as EHRs, claims databases, registries, and patient-generated data. is collected for care delivery, billing, operations, or public health reporting — not for the endpoint a sponsor later wants to analyze. If the endpoint requires symptom severity, functional limitation, discontinuation reason, device handling, or radiographic progression, it may live in notes, images, clinician judgment, or patient diaries rather than structured codes. Prospective capture through an electronic data capture system is the only reliable solution when the target endpoint is not present in structured administrative data.
Patient-governed registries can accelerate trial readiness and natural-history evidence, but they are not commodity data lakes. Sponsors should expect governance review, use-case restrictions, patient-community expectations, consent constraints, data-quality due diligence, and negotiation around publication, exclusivity, and downstream use.
Continuous evidence capture is not the same as continuous clinical insight. MedTech teams need a signal-detection plan that distinguishes routine product experience, adverse events, complaints, device deficiencies, expected disease progression, and user-training issues. The PMCF platform should support event-driven ePRO, clinician follow-up, EHR refresh, and escalation into the quality system.
Retrospective datasets remain valuable. They are often the fastest way to assess feasibility, understand treatment patterns, estimate event rates, and pressure-test protocol assumptions. But buying a dataset is not the same as controlling an evidence engine.
Renting data means accepting another organization’s collection purpose, missingness pattern, latency, tokenization strategy, vocabulary mapping, and refresh cadence. It is useful for discovery and feasibility, but it can leave the sponsor unable to repair missing endpoints or explain every transformation.
Owning the evidence engine means controlling the operating model: protocol-linked data capture, patient consent, identity resolution, source documentation, AI-assisted abstraction, human review, standards mapping, monitoring, analytics, and governance. The asset is not merely the data. The asset is the repeatable capability to generate inspectable evidence. This describes a design principle applicable across vendors and platforms — the question for any sponsor is whether their current infrastructure supports these components, not whether any single vendor provides all of them.
The 2027 RWE question is not, “Which dataset can we buy?” It is, “Which evidence engine can we defend under inspection?”
Map where your RWE programs sit today: dataset rental, point solution, or defensible evidence engine
| Phase | Track A: BioPharma | Track B: MedTech |
|---|---|---|
| Phase 1: Audit and classify Q3 2026 | Inventory registries, natural-history sources, external-control candidates, AI abstraction workflows, and patient-mediated data flows. Classify each study by endpoint reconstructability, missingness, source traceability, consent/governance, and CDISC readiness. | Segment the portfolio by class, certificate timeline, transition-extension eligibility, significant-change risk, PMS/PMCF maturity, Notified Body feedback, and AI-enabled component risk. |
| Phase 2: Design architecture Q4 2026 | Choose the data sourcing strategy: registry licensing, DTP consent, tokenized linkage, EHR network, prospective collection via ePRO/eCOA, or hybrid. Define source-to-OMOP and source-to-CDISC mapping before data ingestion, not after database lock. | Convert PMCF from periodic documentation to continuous evidence capture. Define event-driven ePRO, clinician follow-up, EHR or claims refresh, and the compliance/governance interface. |
| Phase 3: Validate operations Q1 2027 | Validate AI-assisted abstraction against expert review. Prespecify linkage performance metrics, ECA feasibility criteria, missing-data handling, confounding control, and sensitivity analyses. | Run a PMCF dry run with live data: trigger logic, data-quality cadence, source traceability, human review, safety-signal triage, and Notified Body evidence-pack structure. |
| Phase 4: Submit and operate Q2 2027+ | Package evidence with protocol, statistical analysis plan, data provenance, transformation lineage, validation results, and governance documentation. Keep the evidence engine live for label expansion, post-authorization commitments, and lifecycle evidence. | Use PMCF evidence to support MDR recertification, PMS updates, clinical evaluation reports, periodic safety update reports where applicable, and quality-system actions. Monitor any design/intended-purpose changes against extension eligibility. |
The organizations that win in RWE will not be the ones with the most disconnected data assets. They will be the ones that can combine patient access, real-world data linkage, AI-assisted curation, human validation, and standards-ready exports into one defensible evidence operating model.
Map where your RWE programs sit today: dataset rental, point solution, or defensible evidence engine
FDA clarified how it evaluates whether RWD is of sufficient quality to generate RWE for medical-device regulatory decisions. The key shift: lack of sponsor access to identifiable participant-level data does not necessarily preclude evaluation. The trade-off is rigorous documentation of provenance, traceability, transformations, and data-access limitations.[1]
No. Drug and biologic RWE submissions are governed by a separate RWE program and guidelines. Participant-level data, endpoint reconstructability, confounding control, and prespecified design remain central. The Prograf lung-transplant example used non-interventional RWD from the SRTR with confirmatory evidence from prior trials.[2]
Tokenization is privacy-preserving linkage across datasets. It is retrospective and probabilistic or quasi-deterministic depending on the token strategy and data quality. Direct-to-patient consent is patient-authorized retrieval or contribution of data. It can be more deterministic, but it has consent-funnel and governance friction. They solve different problems and are often used together in a layered evidence architecture.
OMOP is widely used for observational and federated analytics, including DARWIN EU. CDISC SDTM is used for standardized regulatory study-data submissions in specified contexts. Mature programs preserve lineage into both analysis and submission structures, enabling dual-mapping from source data to both standards without duplication of effort at submission time.[4][21]
No. The credible pattern is AI-assisted abstraction inside a validated, human-reviewed workflow. The point is to reduce manual burden and increase scalability while retaining source traceability and reviewer accountability. Flatiron’s VALID framework demonstrates this is achievable in narrow, well-defined contexts with explicit validation — not as a general replacement for expert review.[10]
Legacy-device transition extensions under Regulation (EU) 2023/607 are conditional. A device that undergoes a significant change to design or intended purpose may lose the benefit of the extension, so product updates and labeling changes need regulatory review before execution. This is a particular risk for manufacturers actively developing AI-enabled enhancements to existing devices.[8]
Most clinical trial sites still capture protocol data on paper. On July 1 at 1
Nine years into EU MDR, the conversation has moved past “how do we collect PMCF
eCOA systems contain time-dependent logic that unfolds over months of a live study: compliance windows,
"*" indicates required fields
Discover all the features offered by Castor EDC
Discover Now