W3C home > Mailing lists > Public > public-lod@w3.org > December 2016

Re: Real Federated SPARQL Queries Required and Possible Collaboration

From: Jerven Bolleman <me@jerven.eu>
Date: Mon, 19 Dec 2016 22:48:18 +0100
Cc: "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org Web" <semantic-web@w3.org>
Message-Id: <C36FEA07-6ABE-4397-97FA-E0B3B3CCEAEC@jerven.eu>
To: Muhammad Saleem <saleem.muhammd@gmail.com>
Dear Muhammad,

While I am not allowed to share any sparql.UniProt.org query logs I can share a few examples from other places e.g. presentations and public help requests.

I think it would be nice for the other people on the list to see this as well so I am replying publicly.
Mostly, because all of the real flaws that exist in federated querying do not mean that it is not extremely useful!

Hope these 7 are interesting enough ;) Something to note is that quite a few have FILTER clauses to avoid over communication of never binding clauses.
It would be nice if an extension to the SPARQL 1.1 Service description and/or VOID could be used by query engines to avoid sending IRI’s that will never bind.
Could even be something like a namespace + a bloom filter definition to make it even easier to avoid useless HTTP requests.


1. UniProt to WikiData (https://query.wikidata.org/), finds Wikidata items using UniProt identifiers.

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
SELECT ?protein ?uniprot ?id WHERE {
  ?uniprot up:reviewed true ;
           up:organism taxon:9606 .
  BIND (SUBSTR(STR(?uniprot),33) AS ?id)
  #Convert IRI of UniProt to just the accession
  SERVICE <http://query.wikidata.org/sparql>{
    ?protein wdt:P352 ?id .

2. BioModels (https://www.ebi.ac.uk/rdf/services/biomodels/sparql) and Uniprot

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/>

PREFIX sbmlrdf: <http://identifiers.org/biomodels.vocabulary#>
SELECT DISTINCT ?model ?uniprot ?taxonomy ?ec WHERE {
 ?model rdf:type sbmlrdf:SBMLModel .
 ?model ?linkstoellement ?modelelement .
 ?idorgannot owl:sameAs ?uniprot.
 ?modelelement ?qualifier ?idorgannot. 
 ?modelelement rdf:type ?elementType
 FILTER(contains(str(?uniprot), "purl.uniprot.org/uniprot/"))
 	?uniprot up:organism ?taxonomy .
      ?uniprot up:enzyme ?ec .

3. UniProt and Ensembl, find the length of exons coding a Protein described in UniProt

PREFIX up:<http://purl.uniprot.org/core/> 
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX faldo:<http://biohackathon.org/resource/faldo#> 
PREFIX core:<http://purl.uniprot.org/core/>
PREFIX uniprotkb:<http://purl.uniprot.org/uniprot/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
PREFIX obo:<http://purl.obolibrary.org/obo/>
PREFIX ensemblprotein:<http://rdf.ebi.ac.uk/resource/ensembl.protein/>
PREFIX ensemblterms:<http://rdf.ebi.ac.uk/terms/ensembl/>
PREFIX sio:<http://semanticscience.org/resource/>
SELECT ?protein ?transcript ?exon ?order ?length 
  BIND(uniprotkb:P05067 as ?protein)
  ?protein rdfs:seeAlso ?transcript .
  ?transcript core:database <http://purl.uniprot.org/database/Ensembl> .
  SERVICE <http://www.ebi.ac.uk/rdf/services/ensembl/sparql/>{
	?transcript obo:SO_translates_to ?peptide .
        ?peptide a ensemblterms:protein .
        ?transcript obo:SO_has_part ?exon;
           sio:SIO_000974 ?orderedPart .
        ?orderedPart sio:SIO_000628 ?exon .
        ?exon faldo:location ?location .
        ?location faldo:begin ?bf . ?bf faldo:position ?begin .
        ?location faldo:end ?ef . ?ef faldo:position ?end .
        ?orderedPart sio:SIO_000300 ?order .
  BIND(ABS(?end - ?begin) as ?length)

4. neXtProt (https://snorql.nextprot.org/) and UniProt 

PREFIX up:<http://purl.uniprot.org/core/>
PREFIX taxon:<http://purl.uniprot.org/taxonomy/>

SELECT DISTINCT ?entry where {
  ?entry :isoform ?iso.
   SERVICE <http://sparql.uniprot.org/sparql> {
     SELECT DISTINCT ?viralinteractor WHERE # get viral proteins with an IntAct xref
	?viralinteractor a up:Protein .
	?viralinteractor rdfs:seeAlso/up:database <http://purl.uniprot.org/database/IntAct> .
  	?viralinteractor up:organism/rdfs:subClassOf/rdfs:subClassOf taxon:10239 .
  ?entry :isoform / :binaryInteraction / :interactant ?interactant. # NeXtprot entries with an IntAct binary interaction
  ?interactant skos:exactMatch ?viralinteractor . # interactant must be in the uniprot service result set to select the entry
   SERVICE <http://sparql.uniprot.org/sparql> {
     SELECT DISTINCT ?humprotein WHERE # get human proteins that share a PDB xref with a viral protein (same PDB id)
	?humprotein a up:Protein .
    ?humprotein up:organism taxon:9606 . 
	?humprotein rdfs:seeAlso/up:database <http://purl.uniprot.org/database/PDB> .
	?viralprotein a up:Protein ;
	  rdfs:seeAlso ?db ;
  	  up:organism/rdfs:subClassOf/rdfs:subClassOf taxon:10239 .
  BIND (IRI(CONCAT("http://nextprot.org/rdf/entry/NX_",substr(str(?humprotein),33,6))) as ?entry) # cast result to NeXtprot entry
ORDER BY ?entry

5. DisGenet (http://rdf.disgenet.org/sparql/) and UniProt find cancer related genes in DisGenet that are known to be disease related in UniProt

SELECT ?protein ?comment 
    ?protein a ncit:C17021; skos:exactMatch ?uniprot .
    FILTER(strstarts(str(?uniprot), "http://purl.uniprot.org/uniprot"))
    # Query UniProt for proteins with disease annotation
    SERVICE <http://sparql.uniprot.org/sparql> {
        ?uniprot up:annotation ?annotation .
        ?annotation a up:Disease_Annotation ;
            rdfs:comment ?comment .

6. Chembl (https://www.ebi.ac.uk/rdf/services/chembl/) and UniProt, retrieve uniprot protein to makes sense of ChEMBL chemical assay results.

PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX up:<http://purl.uniprot.org/core/> 
PREFIX chembl_molecule: <http://rdf.ebi.ac.uk/resource/chembl/molecule/>

SELECT ?activity ?assay ?target ?targetcmpt ?uniprot ?fullName
  ?activity a cco:Activity ;
  cco:hasMolecule chembl_molecule:CHEMBL941 ;
  cco:hasAssay ?assay .
  ?assay cco:hasTarget ?target .
  ?target cco:hasTargetComponent ?targetcmpt .
  ?targetcmpt cco:targetCmptXref ?uniprot .
  ?uniprot a cco:UniprotRef
  SERVICE <http://sparql.uniprot.org/sparql/>{
    	?uniprot up:recommendedName ?name .
        ?name up:fullName ?fullName .

7. Identifiers.org, for quick IRI translation. A different kind of endpoint all together but really helps making federated queries work seamlessly in the LS field (auto translating IRI based on patterns)


> On 19 Dec 2016, at 21:34, Muhammad Saleem <saleem.muhammd@gmail.com> wrote:
> Dear all,
> Do you have federated SPARQL queries you use today or have used in the past?
> Please e-mail them to us (via saleem.muhammd@gmail.com), as text files or in any way you prefer, together with the URLs of the endpoints on which you’ve executed them.
> Motivated by FEASIBLE [1] SPARQL benchmark generation framework, we want to design a customizable (in terms of queries types, number of queries, number of datasets etc.) federated SPARQL benchmark out of real queries log/use-cases. To this end, we are currently collecting real federated SPARQL queries from different domains, applications, and endpoints for analysis. 
> Your help in this endeavour would be highly appreciated.
> We'll be more than happy to collaborate and/or provide you with the detailed results of the study if you contribute. 
> Your federated queries will also be published as RDF as part of the extended LSQ: The Linked SPARQL Queries Dataset [2,3]. 
> Best Regards,
> Muhammad Saleem
> Ruben Verborgh
> Claus Stadler
> Miel Vander Sande
> Axel-Cyrille Ngonga Ngomo
> Carlos Buil Aranda
> [1] https://svn.aksw.org/papers/2015/ISWC_FEASIBLE/public.pdf
> [2] http://svn.aksw.org/papers/2015/ISWC_LSQ/public.pdf
> [3] http://aksw.github.io/LSQ/
Received on Monday, 19 December 2016 21:48:55 UTC

This archive was generated by hypermail 2.3.1 : Monday, 19 December 2016 21:48:55 UTC