Published on April 09, 2026

There’s a version of the AI-in-clinical-trials conversation that consists mostly of noise. Bold claims, proof-of-concept demos, vendors who say their system handles everything. Then someone asks what the FDA would say, how it fits existing SOPs, or who takes accountability when something goes wrong, and the conversation gets much shorter.

Derk Arts, CEO at Castor, and Alison Bishop, a data management specialist with close to thirty years of experience across small and large clinical research organizations, set out to have a different kind of conversation. One that was, in Derk’s framing, “very specific and very much grounded in reality.” The session ran for about forty-five minutes on March 31 and covered the full data management lifecycle: what AI handles well today, what it is approaching, and where the honest answer is still “not yet.”

The conversation opened where most teams are already starting: document generation. Medical writing, clinical data management plans, database validation protocols. Both Alison and Derk confirmed this is where AI has made the most visible inroads. The technology generates text, so people gravitated toward having it generate text. Alison noted that drafts of the DMP, the SAP, and supporting documents are natural starting points, though all require human review before anything goes anywhere near a sponsor. Both agreed this territory is important and getting better, but it’s not where the session was going to spend its time.

That time went to edit checks.

Derk framed the provocation directly: what if organizations moved away from writing programmed edit checks entirely, and instead deployed an intelligent system to flag problematic data in real time? Hard-coded rules replaced by contextual judgment. Alison’s answer was measured and specific:

“I think there are a number of things that we need to make sure that we work through in order to get there. And obviously sponsors are going to want evidence. They’re going to want proof that what they get at the end is as good as if not better than what they would get with a traditional model.”

Alison Bishop, data management specialist

She outlined what that evidence would look like: run both approaches on a study with known edit check history, compare what each system flags, and build a retroactive benchmark. Derk described how Castor applies this with Castor Catalyst, using real source data and patient journeys to validate AI performance against the historical record. The question most Phase 2/3 teams haven’t asked is not whether AI can flag data issues. It can. The question is whether you can prove it, to a sponsor, in a way that would hold up on inspection.

Query management sits directly downstream of edit checks, and the session moved there next. The same system that identifies a data issue can generate the query. But the more interesting design question is whether it should, and under what conditions. Alison and Derk worked through a risk-stratification model: adverse events and primary endpoints always route through mandatory human review; lower-risk data can move at a different pace once confidence is established. Derk described how confidence scoring in Catalyst serves as the configurable decision point for each query type. Alison put the broader principle in terms that will resonate with anyone who has watched data review backlogs grow:

“AI is taking away the burden of the volume of review… focusing us at the things that are going to have the biggest value.”

Alison Bishop

The governance question ran through all of it. Regulators will require accountability, system validation, and the ability to explain every AI-driven decision. Alison laid out what a human oversight model would actually need: review gates for the first patients, defined processes for model drift, deviation handling that feeds into existing quality management. An AI validation plan sitting alongside the standard data management plan, defining upfront which data the system handles autonomously, which routes require human sign-off, and how confidence thresholds were set, is the direction the session pointed to. Both agreed this is where things are heading.

The recording covers material this post doesn’t have room for.

Derk walks through Castor Catalyst‘s visual audit trail in detail (37:17): a replay capability that shows what the AI agent did and why, step by step, in terms a human reviewer can follow. He makes a counterintuitive point that’s worth the watch on its own. An AI creates a more complete audit record than a human doing the same work, because a CRA doing source data verification doesn’t log every step in their reasoning. The AI logs all of it.

The Q&A includes a direct exchange on automation bias (43:11): how you stop human review from becoming a rubber stamp when the AI is right most of the time. It gets into ensemble validation approaches and why the problem isn’t actually new to this industry.

And Alison closes with a use case worth pursuing (42:10): using AI to track clean patients and flag what still needs doing before an interim deliverable. Straightforward in theory. Harder than it sounds when your systems don’t talk to each other.

Watch the full session on demand

Forty-five minutes on where AI is actually landing in clinical trial data management, built for anyone making technology decisions in the space.

Watch now

Frequently asked questions

Can AI replace programmed edit checks in clinical trials?

Possibly, but not without evidence first. Alison’s position in the session was that sponsors will want proof that an AI-driven approach produces outcomes at least as good as traditional edit checks before they adopt it. The practical path is to run both models in parallel on a study with known edit check history, compare what each system flags, and build a retroactive benchmark. Derk described how Castor applies this with Castor Catalyst, using real source data to validate AI performance against the historical record before asking any sponsor to trust it.

How do you prevent human review of AI output from becoming a rubber stamp?

Automation bias (accepting high-quality AI suggestions without scrutiny) was raised in the Q&A. Derk described one approach Castor uses internally: running a separate AI validation step using a different model or model family to check the output of the first. This creates a consensus check rather than relying on a human reviewer to spot errors in a mostly-correct stream. Risk stratification is the other mechanism: routing the highest-stakes data (adverse events, primary endpoints) through mandatory human review, while lower-risk data moves at a different pace. Derk also made the point that this problem isn’t unique to AI. It’s already present when a CRA does source data verification, and at least with AI you have an audit trail of every decision.

Where should clinical data management teams start with AI adoption?

The session framed it as a progression tied to what you can validate and demonstrate to a sponsor or regulator. Document generation (drafting data management plans, SAPs, validation protocols) is where most teams start, with full human review. Edit checks and query management are the next steps, but both require evidence of equivalence or improvement before replacing the traditional approach. The key point Alison made: process redesign matters more than tool selection. Overlaying AI on an existing workflow rarely produces the efficiency gains clinical data management teams are looking for. The gains come from thinking carefully about where human involvement actually needs to be.

References

  1. Castor LinkedIn Live session: “What AI replaces in Phase 2/3 data management — and what it doesn’t.” Recorded March 31, 2026. Featuring Alison Bishop (data management specialist) and Derk Arts (CEO, Castor).