Upload the source.
Skip manual data entry.

Castor Catalyst reads your clinical source data and structures it. Upload any data, Catalyst extracts it, a human reviewer accepts it, and you get clean records.

 
 
The workflow

Go from upload to EDC, without manual transcription.

Upload PDF visit summaries, scanned paper PROs, lab results, EMR screenshots, or even handwritten worksheets. Catalyst handles the de-identification, extraction, and mapping. Your team reviews and approves before anything reaches the EDC.

Data lands in your EDC: the extracted values appear as completed fields in the Castor EDC participant record.
The view from the industry

Manual data entry at the site is over.
Paper is back in fashion.

So let them keep paper. Catalyst does the rest.

Brad Hightower

Founder & CEO, Hightower Clinical

“Sites are better off with paper source… Editable in seconds for builds and amendments.”

View post on LinkedIn

What sites upload

Any document. One pipeline.

Typed, scanned, or handwritten. If the study collects it, Catalyst reads it. Pick one below and watch it become structured data.

Illustrative examples. The confidence percentages shown are illustrative, not measured performance. Catalyst handles the document types your study produces, with the same source-to-EDC pipeline running underneath.

What sites upload

What the numbers looked like in production.

From one live, paper-based Site Upload study, with human review on every value. A single study, not a platform average, but real production data.

Error rate after human-in-the-loop QC, versus a 6.6% manual abstraction baseline
< 2 %
Override rate. Reviewers corrected only 0.8% of the AI extractions
2 %
Submission success rate into the EDC
2 %
Defensible by design

Built to be defended in a regulatory inspection.

Site Upload moves sensitive source through Catalyst. The trust pillars cover what regulators, sites, and data managers all ask about first.

01

PHI de-identification before processing

  • Redaction boxes drawn over participant identifiers before the file is processed
  • Names, MRNs, dates of birth, addresses
  • Only the redacted version enters the extraction pipeline
  • Original source held in an encrypted, segregated Source Vault with role-based access
02

Visual Audit Trail with HITL review

  • Every extracted value links back to its exact source location (a bounding box on the PDF)
  • Medically trained Castor staff review the values against the source
  • Nothing commits to the EDC until it is approved
  • Audit trail satisfies 21 CFR Part 11 data lineage requirements
03

For source-heavy studies

  • For data that doesn’t fit standard CDISC CDASH: surgical variables, device data, complex endpoints
  • No CRC hand-entering every field
  • Catalyst trains an extraction pipeline tailored to your study (a small set of example files plus an SME walkthrough)
  • Reusable across future studies with the same kind of source
21 CFR Part 11

HIPAA

EU MDR

GxP
CDISC CDASH

AES-256

30-day source purge

Audit trail per field
Where Site Upload fits

Five places where this workflow earns its place.

USE CASE

Paper PRO migrations

Whole-study back-migrations of paper-based patient-reported outcomes.

USE CASE

Global multi-center studies

Site Upload works in any country, any language. Useful for studies that span US and non-US sites, or studies running entirely outside the US.

USE CASE

Lab-heavy studies

Studies where the bulk of source data is structured lab values and reference ranges, uploaded as PDFs by sites or participants, or wired in directly from a central lab.

USE CASE

Source-heavy studies

Studies that capture lots of source data beyond standard CDISC fields: surgical variables, device data, complex endpoints. Catalyst trains a pipeline on your own example documents instead of hand-keying every field.

USE CASE

Multi-site disease registries

Surgical, oncology, and rare disease registries with chart review. Sites stay in their existing EMR workflow and upload source per the registry’s protocol.

Sangrag Ganguli

General Surgery Resident, University of Chicago
From the SAGES x Castor webinar

Expert perspective

"I think that's where this platform really shines… not only giving you the data, but also showing you where that data was gotten from."

FAQ

What others asked.

Castor Catalyst’s Site Upload workflow lets sites upload any document the study produces directly to Catalyst. Common examples include PDF visit summaries (typed or scanned), scanned paper patient-reported outcomes, and lab results, either PDF panels uploaded by sites or participants, or central-lab feeds wired directly into Catalyst. Sites can also upload handwritten worksheets, operative reports, discharge letters, and other free-text clinical notes.

Once a source is uploaded, Catalyst de-identifies any remaining PHI by drawing redaction boxes over identifiers, then runs the document through an AI extraction pipeline. The extracted values are mapped to CDISC CDASH standards (or to pipelines tailored to your study when the source goes beyond CDISC, like surgical variables), passed through Human-in-the-Loop review by medically trained Castor staff, and committed to your Castor EDC with a Visual Audit Trail back to every source.

Site Upload works globally. Sites in any country, any language, can upload source documents directly. For US studies where participants can retrieve their own EHR records, see the Direct-to-Patient workflow.

PDFs (typed or scanned), photos, paper CRFs, paper PROs, lab panels (PDF from sites or participants, or central-lab feeds direct to Catalyst), surgical reports, discharge letters, and other free-text clinical notes. Handwriting is supported. Mixed multi-document scans are supported. If your source falls outside these types, the team confirms fit before the study starts.

Catalyst draws redaction boxes over participant identifiers before the document enters the extraction pipeline. The original source is stored in a logically segregated, encrypted Source Vault. Only pseudonymized structured data flows into the EDC. Source documents are purged 30 days after study close with a written destruction certificate.

Explore the rest of Catalyst

Looking for a different way in?

Catalyst is one product with two entry points. If Direct-to-Patient isn’t quite the right fit for your study, here’s where to go next.

Catalyst overview

The about Catalyst page

AI source-data extraction for clinical trials. The technology overview, both workflows, and the story in one place.

See the overview →

The other workflow

Direct-to-Patient workflow

Participants retrieve their own records through HIPAA release or FHIR. No site upload. US studies.

See Direct-to-Patient →

Walk through Site Upload on a real source document.

Bring a real document from your study. An operative report, a paper PRO, a lab panel, an EMR screenshot. We’ll process it live and show you the audit trail field by field.

What you'll see

  • Live de-identification on your source document
  • Live extraction with confidence scores
  • The Visual Audit Trail walked through field by field
  • A walkthrough of how Catalyst handles your study's non-standard source data

See what the world’s leading AI says about us.

Try Castor EDC For Yourself

Start designing your own study structure and forms today.

Try For Free