Project OREGANO
Introduction
The OREGANO project aims to build a holistic knowledge graph on drugs and to apply link prediction approaches for the discovery of possible drug - target relations for the purpose of drug repositioning. Free databases are integrated in the knowledge graph, this knowledge concern drugs but also proteins, genes and diseases.
In a first step a knowledge graph (data/oregano_knowledge_graph_from_Bio2RDF.tsv) has been build with the Bio2RDF resource, but all of tha databases available in this website are not up-to-date.
So we are currently working on building a new version of this graph with recent versions of those data. The integration is in progress in Integration.
Statistics
The OREGANO knowledge graph is composed of 11 types of nodes and 19 types of links. The current version of the graph contains 99,651 nodes and 788,425 links.
The choice of labels for the links was more complex depending on their orientation and meaning. For this purpose, we used the Relation Ontology, which describes relationships and the categories of concepts that are connected by them. As a result, we looked for links between the nodes of the categories we chose by exploring this ontology. A unique name was given to the link in cases where such a relationship was not included in the Relation Ontology.
Each entity has a unique identifier in OREGANO. The name is formed by the type of node and a number identifying the entity within its type (e.g. "COMPOUND:1").
Number of nodes in the knowledge graph according to the type of node. }
Type of nodes | Number of nodes |
---|---|
Compound | 28,069 |
Target | 21,987 |
Gene | 19,938 |
Disease | 8862 |
ATC | 4121 |
Effect | 162 |
Activity | 76 |
Indication | 2080 |
Side-effect | 5364 |
Pathway | 2062 |
Phenotype | 6930 |
Total | 99,651 |
Number of edges in the knowledge graph according to the relationship.
Labels marked with an asterisk indicate relationships from the Relation Ontology.
Relationship | Numberof edges |
---|---|
has_target | 174,046 |
has_indication | 8225 |
has_side_effect | 112,532 |
has_phenotype* | 104,069 |
is_affecting | 4830 |
is_substance_that_treats* | 209 |
gene_product_of* | 19,061 |
acts_within* | 43,501 |
causes_condition* | 13,260 |
has_activity | 11,751 |
increase_activity | 9909 |
decrease_activity | 3154 |
has_effect | 16,203 |
increase_effect | 16,046 |
decrease_effect | 252 |
increase_efficacy | 45,454 |
decrease_efficacy | 198,592 |
has_code | 3224 |
subclass_of | 4107 |
Total | 788,425 |
Integration files
BINDER
A binder is a file that bind databases together in a large tab containing all of the external links about the entities of this database.
WRAPPER
A wrapper is a file that is extracting datas from the specific database file, format them into triplets and save them into a TSV file.
Extracted Triples
Files containing extracted triples are files that contain all of the triples extracted from the data. Those triples are formated as this : subject [tab] predicate [tab] object . The preidcate is the relation between the subject and the object.
Application
This Github is also containing multiple beta version of an application which will be used for the validation step of the project. The goal of this application is to be able to search articles and data about a relation between a subject and an object within medical databases. This application will provide a checkpoint of the OREGANO prediction section.
SPARQL Endpoint
The endpoint of the graph is available on http://91.121.148.199:8889/bigdata/#query
This is an example of query to search the "has_target" links between compounds and targets.
PREFIX oregano: <http://erias.fr/oregano/#>
SELECT ?compound ?target
WHERE {
?compound oregano:has_target ?target.
}
LIMIT 100
The following query enables to retrieve the targets connected to Ivacaftor which is a drug used to treat cystic fibrosis patients :
PREFIX oregano: <http://erias.fr/oregano/#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema/#>
SELECT DISTINCT ?name_target
WHERE {
?compound rdfs:label "db:ivacaftor".
?compound oregano:has_target ?target .
?target rdfs:label ?name_target
}
The following query illustrates how to find the names of drugs that treat the HIV infection :
PREFIX oregano: <http://erias.fr/oregano/#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema/#>
SELECT DISTINCT ?compound ?name_compound
WHERE {
?disease rdfs:label "pgkb:hiv infections".
?compound oregano:is_substance_that_treats ?disease.
?compound rdfs:label ?name_compound.
}
This query illustrates how to find the natural compound "baclofen" (The natural compounds in the knowledge graph have an NPASS code which can be obtained using the "oregano:npass" relationship.) in the knowledge graph and his side effects :
PREFIX oregano: <http://erias.fr/oregano/#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema/#>
SELECT DISTINCT ?side_effect ?side_effect_label
WHERE {
?natural_compound rdfs:label "np:baclofen".
?natural_compound oregano:has_side_effect ?side_effect.
?side_effect rdfs:label ?side_effect_label
}
LIMIT 5
URI description
The entities URI are in the format : http://erias.fr/oregano/[TYPE_OF_ENTITY]/[ID_OF_THIS_ENTITY]
For example, the URI of the compound Atazanavir, which is the compound number 1052, is : http://erias.fr/oregano/compound/1052
The relationship URI are in the format : http://erias.fr/oregano/#[TYPE_OF_RELATION].
For example, the URI of the relationship "has_target" is : http://erias.fr/oregano/#has_target
Publications :
- Boudin, Marina. “Computational Approaches for Drug Repositioning: Towards a Holistic Perspective Based on Knowledge Graphs.” Proceedings of the 29th ACM International Conference on Information & Knowledge Management, ACM, 2020, pp. 3225–28, doi:10.1145/3340531.3418510.