- From: Jerven Bolleman <me@jerven.eu>
- Date: Mon, 19 Dec 2016 22:48:18 +0100
- To: Muhammad Saleem <saleem.muhammd@gmail.com>
- Cc: "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org Web" <semantic-web@w3.org>
Dear Muhammad,
While I am not allowed to share any sparql.UniProt.org query logs I can share a few examples from other places e.g. presentations and public help requests.
I think it would be nice for the other people on the list to see this as well so I am replying publicly.
Mostly, because all of the real flaws that exist in federated querying do not mean that it is not extremely useful!
Hope these 7 are interesting enough ;) Something to note is that quite a few have FILTER clauses to avoid over communication of never binding clauses.
It would be nice if an extension to the SPARQL 1.1 Service description and/or VOID could be used by query engines to avoid sending IRI’s that will never bind.
Could even be something like a namespace + a bloom filter definition to make it even easier to avoid useless HTTP requests.
Regards,
Jerven
1. UniProt to WikiData (https://query.wikidata.org/), finds Wikidata items using UniProt identifiers.
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
SELECT ?protein ?uniprot ?id WHERE {
?uniprot up:reviewed true ;
up:organism taxon:9606 .
BIND (SUBSTR(STR(?uniprot),33) AS ?id)
#Convert IRI of UniProt to just the accession
SERVICE <http://query.wikidata.org/sparql>{
?protein wdt:P352 ?id .
}
}
2. BioModels (https://www.ebi.ac.uk/rdf/services/biomodels/sparql) and Uniprot
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/>
PREFIX sbmlrdf: <http://identifiers.org/biomodels.vocabulary#>
SELECT DISTINCT ?model ?uniprot ?taxonomy ?ec WHERE {
?model rdf:type sbmlrdf:SBMLModel .
?model ?linkstoellement ?modelelement .
?idorgannot owl:sameAs ?uniprot.
?modelelement ?qualifier ?idorgannot.
?modelelement rdf:type ?elementType
FILTER(contains(str(?uniprot), "purl.uniprot.org/uniprot/"))
SERVICE<http://sparql.uniprot.org/sparql>{
?uniprot up:organism ?taxonomy .
?uniprot up:enzyme ?ec .
}
}
3. UniProt and Ensembl, find the length of exons coding a Protein described in UniProt
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX faldo:<http://biohackathon.org/resource/faldo#>
PREFIX core:<http://purl.uniprot.org/core/>
PREFIX uniprotkb:<http://purl.uniprot.org/uniprot/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo:<http://purl.obolibrary.org/obo/>
PREFIX ensemblprotein:<http://rdf.ebi.ac.uk/resource/ensembl.protein/>
PREFIX ensemblterms:<http://rdf.ebi.ac.uk/terms/ensembl/>
PREFIX sio:<http://semanticscience.org/resource/>
SELECT ?protein ?transcript ?exon ?order ?length
{
BIND(uniprotkb:P05067 as ?protein)
?protein rdfs:seeAlso ?transcript .
?transcript core:database <http://purl.uniprot.org/database/Ensembl> .
SERVICE <http://www.ebi.ac.uk/rdf/services/ensembl/sparql/>{
?transcript obo:SO_translates_to ?peptide .
?peptide a ensemblterms:protein .
?transcript obo:SO_has_part ?exon;
sio:SIO_000974 ?orderedPart .
?orderedPart sio:SIO_000628 ?exon .
?exon faldo:location ?location .
?location faldo:begin ?bf . ?bf faldo:position ?begin .
?location faldo:end ?ef . ?ef faldo:position ?end .
?orderedPart sio:SIO_000300 ?order .
}
BIND(ABS(?end - ?begin) as ?length)
}
4. neXtProt (https://snorql.nextprot.org/) and UniProt
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX taxon:<http://purl.uniprot.org/taxonomy/>
SELECT DISTINCT ?entry where {
?entry :isoform ?iso.
{
SERVICE <http://sparql.uniprot.org/sparql> {
SELECT DISTINCT ?viralinteractor WHERE # get viral proteins with an IntAct xref
{
?viralinteractor a up:Protein .
?viralinteractor rdfs:seeAlso/up:database <http://purl.uniprot.org/database/IntAct> .
?viralinteractor up:organism/rdfs:subClassOf/rdfs:subClassOf taxon:10239 .
}
}
?entry :isoform / :binaryInteraction / :interactant ?interactant. # NeXtprot entries with an IntAct binary interaction
?interactant skos:exactMatch ?viralinteractor . # interactant must be in the uniprot service result set to select the entry
}
UNION
{
SERVICE <http://sparql.uniprot.org/sparql> {
SELECT DISTINCT ?humprotein WHERE # get human proteins that share a PDB xref with a viral protein (same PDB id)
{
?humprotein a up:Protein .
?humprotein up:organism taxon:9606 .
?humprotein rdfs:seeAlso/up:database <http://purl.uniprot.org/database/PDB> .
?viralprotein a up:Protein ;
rdfs:seeAlso ?db ;
up:organism/rdfs:subClassOf/rdfs:subClassOf taxon:10239 .
}
}
BIND (IRI(CONCAT("http://nextprot.org/rdf/entry/NX_",substr(str(?humprotein),33,6))) as ?entry) # cast result to NeXtprot entry
}
}
ORDER BY ?entry
5. DisGenet (http://rdf.disgenet.org/sparql/) and UniProt find cancer related genes in DisGenet that are known to be disease related in UniProt
SELECT ?protein ?comment
WHERE {
?protein a ncit:C17021; skos:exactMatch ?uniprot .
FILTER(strstarts(str(?uniprot), "http://purl.uniprot.org/uniprot"))
# Query UniProt for proteins with disease annotation
SERVICE <http://sparql.uniprot.org/sparql> {
?uniprot up:annotation ?annotation .
?annotation a up:Disease_Annotation ;
rdfs:comment ?comment .
}}
LIMIT 10
6. Chembl (https://www.ebi.ac.uk/rdf/services/chembl/) and UniProt, retrieve uniprot protein to makes sense of ChEMBL chemical assay results.
PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX chembl_molecule: <http://rdf.ebi.ac.uk/resource/chembl/molecule/>
SELECT ?activity ?assay ?target ?targetcmpt ?uniprot ?fullName
WHERE {
?activity a cco:Activity ;
cco:hasMolecule chembl_molecule:CHEMBL941 ;
cco:hasAssay ?assay .
?assay cco:hasTarget ?target .
?target cco:hasTargetComponent ?targetcmpt .
?targetcmpt cco:targetCmptXref ?uniprot .
?uniprot a cco:UniprotRef
SERVICE <http://sparql.uniprot.org/sparql/>{
?uniprot up:recommendedName ?name .
?name up:fullName ?fullName .
}
}
7. Identifiers.org, for quick IRI translation. A different kind of endpoint all together but really helps making federated queries work seamlessly in the LS field (auto translating IRI based on patterns)
identifiers.org/services/sparql
> On 19 Dec 2016, at 21:34, Muhammad Saleem <saleem.muhammd@gmail.com> wrote:
>
> Dear all,
>
> Do you have federated SPARQL queries you use today or have used in the past?
> Please e-mail them to us (via saleem.muhammd@gmail.com), as text files or in any way you prefer, together with the URLs of the endpoints on which you’ve executed them.
>
> Motivated by FEASIBLE [1] SPARQL benchmark generation framework, we want to design a customizable (in terms of queries types, number of queries, number of datasets etc.) federated SPARQL benchmark out of real queries log/use-cases. To this end, we are currently collecting real federated SPARQL queries from different domains, applications, and endpoints for analysis.
>
> Your help in this endeavour would be highly appreciated.
> We'll be more than happy to collaborate and/or provide you with the detailed results of the study if you contribute.
>
> Your federated queries will also be published as RDF as part of the extended LSQ: The Linked SPARQL Queries Dataset [2,3].
>
> Best Regards,
>
> Muhammad Saleem
> Ruben Verborgh
> Claus Stadler
> Miel Vander Sande
> Axel-Cyrille Ngonga Ngomo
> Carlos Buil Aranda
>
> [1] https://svn.aksw.org/papers/2015/ISWC_FEASIBLE/public.pdf
> [2] http://svn.aksw.org/papers/2015/ISWC_LSQ/public.pdf
> [3] http://aksw.github.io/LSQ/
>
>
Received on Monday, 19 December 2016 21:48:58 UTC