Truveta Data
Clinical notes
Largest collection of clinical notes integrated with EHR data
Nearly 80% of data relevant to research is hidden in unstructured notes
The Truveta Language Model extracts data from notes at scale, empowering researchers with data for more than 5 billion free-text notes.
Understand clinical context for patients
With access to complete EHR data — including notes — linked with social drivers of health, mortality, and claims data, researchers can understand the complete patient journey and address previously unanswerable questions.
Identifying key moments in the patient journey
Repeated outpatient visit in January
First dermatology visit in February
Hidradenitis suppurativa diagnosis in November​
Primary care visit
Specialist visit
Diagnosis
Repeated outpatient visit in January
Redacted patient note showing delays in diagnosis and treatment for a less common condition
First dermatology visit in February
Redacted patient note showing delays in diagnosis and treatment for a less common condition
Hidradenitis suppurativa diagnosis in November​
Redacted patient note showing delays in diagnosis and treatment for a less common condition
Unlock access to any clinical concept of interest
Researchers can access a continually expanding library of clinical concepts spanning diverse therapeutic areas and clinical scenarios.
Cardiovascular
- Tricuspid valve regurgitation
- Echocardiograms: Quantitative results​
- Cardiac catheterization reports: Hemodynamic measurements, CAD concepts​​
- NYHA and KCCQ scores​​
- QRS duration​
- Vessel disease
Neurology
- Seizure frequency
- Migraine frequency & severity​
- Migraine treatment response
- Migraine symptoms & triggers
- Migraine treatment (triptans) status and discontinuation reason​​
Oncology
- Hepatocellular carcinoma: ECOG Performance status scores, Child-Pugh scores, and Barcelona Clinic Liver Cancer stage​
- Colon cancer: Staging, family history, pathology report findings​
Rare disease
- HoFH: Confirmation of diagnosis
- OTC Deficiency: symptoms, disease progression, & dietary intake​
Metabolics
- GLP-1 status and discontinuation reason
Hepatology
- Fibrosis stage, steatosis, and hepatocyte ballooning​
Pulmonology
- Pulmonary function test results​​
Truveta receives all clinical notes generated during a patient’s care. This includes progress notes, nursing evaluations, procedure/operative reports, referral notes, discharge summaries, imaging reports, and more.
TLM extracts clinical concepts from notes at scale, linking them to structured data to enable robust research across therapeutic areas, including cardiovascular, metabolic diseases, oncology, neurology, hepatology, pulmonology, and rare diseases. Broader clinical insights, such as nutrition details, apply across disease states.
Detailed example of concepts extracted from echocardiograms
Sampling of normalized echocardiogram data in Truveta Data
Detailed example of concepts extracted from echocardiograms
Sampling of normalized echocardiogram data in Truveta Data
Truveta receives all clinical notes generated during a patient’s care. This includes progress notes, nursing evaluations, procedure/operative reports, referral notes, discharge summaries, imaging reports, and more.
TLM extracts clinical concepts from notes at scale, linking them to structured data to enable robust research across therapeutic areas, including cardiovascular, metabolic diseases, oncology, neurology, hepatology, pulmonology, and rare diseases. Broader clinical insights, such as nutrition details, apply across disease states.
Answer novel research questions
Example applications of notes data
Classify disease severity and monitor disease progression to inform R&D
Using echocardiogram data to classify aortic stenosis severity
Assess lifestyle behaviors and symptom prevalence to optimize clinical trial design
Analyzing diet data for a rare disease requiring dietary modifications
Identify potential confounders relevant to comparative effectiveness research
Identifying confounders before head-to-head SGLT2i study
AI enables accuracy at scale
The Truveta Language Model, a large language model trained on medical records data, is designed to identify and structure clinical data from notes and account for nuances such as negation, hypotheticals/conditionals, and family history. The model is continuously evaluated and fine-tuned to ensure clinical accuracy.
Learn more about the depth of Truveta Data
Complete and clean EHR data
Truveta offers complete, timely, and clean EHR data linked with SDOH, mortality, and claims data for more than 120M patients representing the full diversity of the US.
Medical images and metadata
Truveta provides access to millions of medical images across all modalities, including MRI, CT, X-ray, ultrasound, mammogram, PET, and nuclear medicine, searchable by modality and protocol.