Re: Real Federated SPARQL Queries Required and Possible Collaboration

Dear Muhammad,

While I am not allowed to share any sparql.UniProt.org query logs I can share a few examples from other places e.g. presentations and public help requests.

I think it would be nice for the other people on the list to see this as well so I am replying publicly.
Mostly, because all of the real flaws that exist in federated querying do not mean that it is not extremely useful!

Hope these 7 are interesting enough ;) Something to note is that quite a few have FILTER clauses to avoid over communication of never binding clauses.
It would be nice if an extension to the SPARQL 1.1 Service description and/or VOID could be used by query engines to avoid sending IRI’s that will never bind.
Could even be something like a namespace + a bloom filter definition to make it even easier to avoid useless HTTP requests.

Regards,
Jerven


1. UniProt to WikiData (https://query.wikidata.org/), finds Wikidata items using UniProt identifiers.

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
SELECT ?protein ?uniprot ?id WHERE {
  ?uniprot up:reviewed true ;
           up:organism taxon:9606 .
  BIND (SUBSTR(STR(?uniprot),33) AS ?id)
  #Convert IRI of UniProt to just the accession
  SERVICE <http://query.wikidata.org/sparql>{
    ?protein wdt:P352 ?id .
  }
}

2. BioModels (https://www.ebi.ac.uk/rdf/services/biomodels/sparql) and Uniprot

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/>

PREFIX sbmlrdf: <http://identifiers.org/biomodels.vocabulary#>
SELECT DISTINCT ?model ?uniprot ?taxonomy ?ec WHERE {
 ?model rdf:type sbmlrdf:SBMLModel .
 ?model ?linkstoellement ?modelelement .
 ?idorgannot owl:sameAs ?uniprot.
 ?modelelement ?qualifier ?idorgannot. 
 ?modelelement rdf:type ?elementType
 FILTER(contains(str(?uniprot), "purl.uniprot.org/uniprot/"))
 SERVICE<http://sparql.uniprot.org/sparql>{
  ?uniprot up:organism ?taxonomy .
      ?uniprot up:enzyme ?ec .
 }
}

3. UniProt and Ensembl, find the length of exons coding a Protein described in UniProt

PREFIX up:<http://purl.uniprot.org/core/> 
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX faldo:<http://biohackathon.org/resource/faldo#> 
PREFIX core:<http://purl.uniprot.org/core/>
PREFIX uniprotkb:<http://purl.uniprot.org/uniprot/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
PREFIX obo:<http://purl.obolibrary.org/obo/>
PREFIX ensemblprotein:<http://rdf.ebi.ac.uk/resource/ensembl.protein/>
PREFIX ensemblterms:<http://rdf.ebi.ac.uk/terms/ensembl/>
PREFIX sio:<http://semanticscience.org/resource/>
SELECT ?protein ?transcript ?exon ?order ?length 
{
  BIND(uniprotkb:P05067 as ?protein)
  ?protein rdfs:seeAlso ?transcript .
  ?transcript core:database <http://purl.uniprot.org/database/Ensembl> .
  SERVICE <http://www.ebi.ac.uk/rdf/services/ensembl/sparql/>{
 ?transcript obo:SO_translates_to ?peptide .
        ?peptide a ensemblterms:protein .
        ?transcript obo:SO_has_part ?exon;
           sio:SIO_000974 ?orderedPart .
        ?orderedPart sio:SIO_000628 ?exon .
        ?exon faldo:location ?location .
        ?location faldo:begin ?bf . ?bf faldo:position ?begin .
        ?location faldo:end ?ef . ?ef faldo:position ?end .
        ?orderedPart sio:SIO_000300 ?order .
  }
  BIND(ABS(?end - ?begin) as ?length)
}

4. neXtProt (https://snorql.nextprot.org/) and UniProt 

PREFIX up:<http://purl.uniprot.org/core/>
PREFIX taxon:<http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?entry where {
  ?entry :isoform ?iso.
  {
   SERVICE <http://sparql.uniprot.org/sparql> {
     SELECT DISTINCT ?viralinteractor WHERE # get viral proteins with an IntAct xref
    {
 ?viralinteractor a up:Protein .
 ?viralinteractor rdfs:seeAlso/up:database <http://purl.uniprot.org/database/IntAct> .
   ?viralinteractor up:organism/rdfs:subClassOf/rdfs:subClassOf taxon:10239 .
    }
   }
  ?entry :isoform / :binaryInteraction / :interactant ?interactant. # NeXtprot entries with an IntAct binary interaction
  ?interactant skos:exactMatch ?viralinteractor . # interactant must be in the uniprot service result set to select the entry
  } 
UNION
  {
   SERVICE <http://sparql.uniprot.org/sparql> {
     SELECT DISTINCT ?humprotein WHERE # get human proteins that share a PDB xref with a viral protein (same PDB id)
    {
 ?humprotein a up:Protein .
    ?humprotein up:organism taxon:9606 . 
 ?humprotein rdfs:seeAlso/up:database <http://purl.uniprot.org/database/PDB> .
 ?viralprotein a up:Protein ;
   rdfs:seeAlso ?db ;
     up:organism/rdfs:subClassOf/rdfs:subClassOf taxon:10239 .
    }
   }
  BIND (IRI(CONCAT("http://nextprot.org/rdf/entry/NX_",substr(str(?humprotein),33,6))) as ?entry) # cast result to NeXtprot entry
  }  
}
ORDER BY ?entry

5. DisGenet (http://rdf.disgenet.org/sparql/) and UniProt find cancer related genes in DisGenet that are known to be disease related in UniProt

SELECT ?protein ?comment 
WHERE {
    ?protein a ncit:C17021; skos:exactMatch ?uniprot .
    FILTER(strstarts(str(?uniprot), "http://purl.uniprot.org/uniprot"))
    # Query UniProt for proteins with disease annotation
    SERVICE <http://sparql.uniprot.org/sparql> {
        ?uniprot up:annotation ?annotation .
        ?annotation a up:Disease_Annotation ;
            rdfs:comment ?comment .
    }} 
LIMIT 10

6. Chembl (https://www.ebi.ac.uk/rdf/services/chembl/) and UniProt, retrieve uniprot protein to makes sense of ChEMBL chemical assay results.


PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX up:<http://purl.uniprot.org/core/> 
PREFIX chembl_molecule: <http://rdf.ebi.ac.uk/resource/chembl/molecule/>

SELECT ?activity ?assay ?target ?targetcmpt ?uniprot ?fullName
WHERE {
  ?activity a cco:Activity ;
  cco:hasMolecule chembl_molecule:CHEMBL941 ;
  cco:hasAssay ?assay .
  ?assay cco:hasTarget ?target .
  ?target cco:hasTargetComponent ?targetcmpt .
  ?targetcmpt cco:targetCmptXref ?uniprot .
  ?uniprot a cco:UniprotRef
  SERVICE <http://sparql.uniprot.org/sparql/>{
     ?uniprot up:recommendedName ?name .
        ?name up:fullName ?fullName .
    }
}

7. Identifiers.org, for quick IRI translation. A different kind of endpoint all together but really helps making federated queries work seamlessly in the LS field (auto translating IRI based on patterns)

identifiers.org/services/sparql


> On 19 Dec 2016, at 21:34, Muhammad Saleem <saleem.muhammd@gmail.com> wrote:
> 
> Dear all,
> 
> Do you have federated SPARQL queries you use today or have used in the past?
> Please e-mail them to us (via saleem.muhammd@gmail.com), as text files or in any way you prefer, together with the URLs of the endpoints on which you’ve executed them.
> 
> Motivated by FEASIBLE [1] SPARQL benchmark generation framework, we want to design a customizable (in terms of queries types, number of queries, number of datasets etc.) federated SPARQL benchmark out of real queries log/use-cases. To this end, we are currently collecting real federated SPARQL queries from different domains, applications, and endpoints for analysis. 
> 
> Your help in this endeavour would be highly appreciated.
> We'll be more than happy to collaborate and/or provide you with the detailed results of the study if you contribute. 
> 
> Your federated queries will also be published as RDF as part of the extended LSQ: The Linked SPARQL Queries Dataset [2,3]. 
> 
> Best Regards,
> 
> Muhammad Saleem
> Ruben Verborgh
> Claus Stadler
> Miel Vander Sande
> Axel-Cyrille Ngonga Ngomo
> Carlos Buil Aranda
> 
> [1] https://svn.aksw.org/papers/2015/ISWC_FEASIBLE/public.pdf
> [2] http://svn.aksw.org/papers/2015/ISWC_LSQ/public.pdf
> [3] http://aksw.github.io/LSQ/
> 
> 

Received on Monday, 19 December 2016 21:48:58 UTC