Truveta Data

Truveta Language Model

Clinically trained AI accurately cleaning billions of EHR datapoints daily

Healthcare organizations generate vast amounts of data, yet 95% goes unused – largely because it is fragmented, inaccessible, and unstructured

Truveta Language Model is a large-language, multi-modal AI model that cleans billions of daily EHR data points with industry-leading precision, enabling scientifically rigorous research.

Read the Truveta Language Model whitepaper

Daily updated EHR data cleaned with AI

TLM’s healthcare expertise is trained on the largest collection of complete medical records for more than 100 million patients, representing the full diversity of the United States.

TLM can clean all types of EHR data, including semi-structured data such as lab tests or diagnoses, or unstructured data such as the contents of clinical notes or imaging reports.

Raw medical text is normalized to the most appropriate medical ontology with standard units of measurement, ensuring research-ready inputs.

TLM maps clinical concepts to standard medical ontologies

For example, TLM’s data cleaning process is applied to lab test results below. TLM structured two sets of lab test results into four rows of the LabResult table within the Truveta Data Model (TDM). Each test is mapped to a standard medical ontology with standard units of measurement.

TLM mapping lab results to the appropriate standard medical ontology

Accelerate access to data hidden in notes

Customers have benefitted from the breadth and depth of Truveta’s clinical notes for many conditions, including colon cancer, migraines, seizures, NASH, heart failure, vessel disease, hypercholesterolemia, rare diseases, and others.

Truveta Data now includes cardiac catheterization reports within EHR data for medical research to improve cardiovascular care

Example data extracted from a cardiac catheterization report.

TLM helps researchers understand complete clinical context and pursue novel research with more than 7 billion notes.

Truveta receives all clinical notes generated during a patient’s care. This includes progress notes, nursing evaluations, procedure/operative reports, referral notes, discharge summaries, and more.

TLM extracts and normalizes concepts from free-text clinical notes, eliminating the need for human-based abstraction to uncover critical data points such as disease staging, adverse events, and medication rationale changes.

Conduct research confidently with high-quality data

TLM aims to exceed the accuracy of clinical experts reviewing medical records, providing the data transparency and rigor required to be trusted by regulators.

TLM is currently achieving high accuracy on diagnoses, medications, lab results, clinical observations, and more – outperforming GPT-4, LogMap, AML, and others.

Unlike general large language models, which understand language but are often inaccurate within the medical domain due to being trained on the public Internet, TLM’s specialized training on healthcare data is critical for ensuring the clinical validity of Truveta Data.

TLM normalization capabilities compared to human experts

TLM aims to exceed the accuracy of clinical experts reviewing medical records, providing the data transparency and rigor required to be trusted by regulators.

TLM is currently achieving high accuracy on diagnoses, medications, lab results, clinical observations, and more – outperforming GPT-4, LogMap, AML, and others.

Unlike general large language models, which understand language but are often inaccurate within the medical domain due to being trained on the public Internet, TLM’s specialized training on healthcare data is critical for ensuring the clinical validity of Truveta Data.

Learn more about the depth of Truveta Data

Complete and clean EHR data

Truveta offers complete, timely, and clean EHR data linked with SDOH, mortality, and claims data for more than 120M patients representing the full diversity of the US.

Explore Truveta Data

More than 7 billion clinical notes

The Truveta Language Model cleans and structures data from notes at scale, enabling researchers to understand patients’ complete clinical context and answer novel research questions.

Learn more about notes

Truveta Data

Truveta Language Model

Clinically trained AI accurately cleaning billions of EHR datapoints daily

Healthcare organizations generate vast amounts of data, yet 95% goes unused – largely because it is fragmented, inaccessible, and unstructured

Daily updated EHR data cleaned with AI

Accelerate access to data hidden in notes

Conduct research confidently with high-quality data

Learn more about the depth of Truveta Data

Complete and clean EHR data

More than 7 billion clinical notes

Interested in learning more?

Ready to accelerate your research with representative, complete, and timely real-world data?