Note
If you use PDD Graph data in your work, please cite the following publication:
We ask that users who download significant portions of the database cite the MIMIC-III paper in any resulting publications.
The latest news
In the new version 1.4, we add detailed information about the prescriptions, including the dosage, the duration and so on. These information will be applied to conveniently retrieve the exact adverse drug combinations taken by corresponding patients.
A specific example please refer to the Tutorial SPARQL Query Example5
Introduction
The gap between clinical data and biomedical knowledge graphs:- An EMR database, MIMIC-III: contains multi-format electronic data but remains limitations in scope.
- Biomedical KGs: cover basic medical facts, but contain little information about clinical outcomes.
The gap between clinical data and biomedical KGs prohibits further exploring medical entity relationships on ether side, as shown in the following figure:
What is PDD Graph (Patient-Disease-Drug Graph):
Electronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Facing with patients symptoms, experienced caregivers make right medical decisions based on their professional knowledge that accurately grasps relationships between symptoms, diagnosis, and treatments. We aim to capture these relationships by constructing a large and high-quality heterogeneous graph linking patients, diseases, and drugs (PDD) in EMRs.
Specifically, we extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph is accessible on the Web via the SPARQL endpoint, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.
A subgraph of PDD is illustrated in the following figure to better understand the PDD graph.
Download
Format 1:N-Triples
RDF data files ended by .nt.Format 2:Apache Jena
Formatted by Apache Jena TDB , you can user it in Jena API.Friendly Link
Our data uses other resources, so we make a statement here.
- MIMIC-III
- Bio2RDF
- DrugBank
- ICD-9 ontology
License
Contact
Update
-
V1.3
We have attached DDI triples in the latest version 1.3. These DDI triples are extracted from DrugBank and will be applied to conveniently retrieve the possible adverse drug combinations taken by corresponding patients.
A specific example please refer to the Tutorial SPARQL Query Example5
-
V1.2
Fix the bugs in "diagnose_icd_information.nt".
In the new version, we have eliminated an engineering bug that was made when label matching of ICD-9 codes. This bug results in the linking failures of 380 diseases in MIMIC-III.
For diseases in the latest PDD version, the overall number of diseases is 6985, and 6,983 diseases are connected to ICD-9 ontology. The only two failed matching codes are '71970' and 'NULL', which are not included in ICD-9 ontology.
-
V1.1
Add Patient BMI data.