- From: Jerven Bolleman <me@jerven.eu>
- Date: Mon, 19 Dec 2016 22:48:18 +0100
- To: Muhammad Saleem <saleem.muhammd@gmail.com>
- Cc: "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org Web" <semantic-web@w3.org>
Dear Muhammad, While I am not allowed to share any sparql.UniProt.org query logs I can share a few examples from other places e.g. presentations and public help requests. I think it would be nice for the other people on the list to see this as well so I am replying publicly. Mostly, because all of the real flaws that exist in federated querying do not mean that it is not extremely useful! Hope these 7 are interesting enough ;) Something to note is that quite a few have FILTER clauses to avoid over communication of never binding clauses. It would be nice if an extension to the SPARQL 1.1 Service description and/or VOID could be used by query engines to avoid sending IRI’s that will never bind. Could even be something like a namespace + a bloom filter definition to make it even easier to avoid useless HTTP requests. Regards, Jerven 1. UniProt to WikiData (https://query.wikidata.org/), finds Wikidata items using UniProt identifiers. PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX up: <http://purl.uniprot.org/core/> PREFIX taxon: <http://purl.uniprot.org/taxonomy/> SELECT ?protein ?uniprot ?id WHERE { ?uniprot up:reviewed true ; up:organism taxon:9606 . BIND (SUBSTR(STR(?uniprot),33) AS ?id) #Convert IRI of UniProt to just the accession SERVICE <http://query.wikidata.org/sparql>{ ?protein wdt:P352 ?id . } } 2. BioModels (https://www.ebi.ac.uk/rdf/services/biomodels/sparql) and Uniprot PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX up: <http://purl.uniprot.org/core/> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> PREFIX sbmlrdf: <http://identifiers.org/biomodels.vocabulary#> SELECT DISTINCT ?model ?uniprot ?taxonomy ?ec WHERE { ?model rdf:type sbmlrdf:SBMLModel . ?model ?linkstoellement ?modelelement . ?idorgannot owl:sameAs ?uniprot. ?modelelement ?qualifier ?idorgannot. ?modelelement rdf:type ?elementType FILTER(contains(str(?uniprot), "purl.uniprot.org/uniprot/")) SERVICE<http://sparql.uniprot.org/sparql>{ ?uniprot up:organism ?taxonomy . ?uniprot up:enzyme ?ec . } } 3. UniProt and Ensembl, find the length of exons coding a Protein described in UniProt PREFIX up:<http://purl.uniprot.org/core/> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX faldo:<http://biohackathon.org/resource/faldo#> PREFIX core:<http://purl.uniprot.org/core/> PREFIX uniprotkb:<http://purl.uniprot.org/uniprot/> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX obo:<http://purl.obolibrary.org/obo/> PREFIX ensemblprotein:<http://rdf.ebi.ac.uk/resource/ensembl.protein/> PREFIX ensemblterms:<http://rdf.ebi.ac.uk/terms/ensembl/> PREFIX sio:<http://semanticscience.org/resource/> SELECT ?protein ?transcript ?exon ?order ?length { BIND(uniprotkb:P05067 as ?protein) ?protein rdfs:seeAlso ?transcript . ?transcript core:database <http://purl.uniprot.org/database/Ensembl> . SERVICE <http://www.ebi.ac.uk/rdf/services/ensembl/sparql/>{ ?transcript obo:SO_translates_to ?peptide . ?peptide a ensemblterms:protein . ?transcript obo:SO_has_part ?exon; sio:SIO_000974 ?orderedPart . ?orderedPart sio:SIO_000628 ?exon . ?exon faldo:location ?location . ?location faldo:begin ?bf . ?bf faldo:position ?begin . ?location faldo:end ?ef . ?ef faldo:position ?end . ?orderedPart sio:SIO_000300 ?order . } BIND(ABS(?end - ?begin) as ?length) } 4. neXtProt (https://snorql.nextprot.org/) and UniProt PREFIX up:<http://purl.uniprot.org/core/> PREFIX taxon:<http://purl.uniprot.org/taxonomy/> SELECT DISTINCT ?entry where { ?entry :isoform ?iso. { SERVICE <http://sparql.uniprot.org/sparql> { SELECT DISTINCT ?viralinteractor WHERE # get viral proteins with an IntAct xref { ?viralinteractor a up:Protein . ?viralinteractor rdfs:seeAlso/up:database <http://purl.uniprot.org/database/IntAct> . ?viralinteractor up:organism/rdfs:subClassOf/rdfs:subClassOf taxon:10239 . } } ?entry :isoform / :binaryInteraction / :interactant ?interactant. # NeXtprot entries with an IntAct binary interaction ?interactant skos:exactMatch ?viralinteractor . # interactant must be in the uniprot service result set to select the entry } UNION { SERVICE <http://sparql.uniprot.org/sparql> { SELECT DISTINCT ?humprotein WHERE # get human proteins that share a PDB xref with a viral protein (same PDB id) { ?humprotein a up:Protein . ?humprotein up:organism taxon:9606 . ?humprotein rdfs:seeAlso/up:database <http://purl.uniprot.org/database/PDB> . ?viralprotein a up:Protein ; rdfs:seeAlso ?db ; up:organism/rdfs:subClassOf/rdfs:subClassOf taxon:10239 . } } BIND (IRI(CONCAT("http://nextprot.org/rdf/entry/NX_",substr(str(?humprotein),33,6))) as ?entry) # cast result to NeXtprot entry } } ORDER BY ?entry 5. DisGenet (http://rdf.disgenet.org/sparql/) and UniProt find cancer related genes in DisGenet that are known to be disease related in UniProt SELECT ?protein ?comment WHERE { ?protein a ncit:C17021; skos:exactMatch ?uniprot . FILTER(strstarts(str(?uniprot), "http://purl.uniprot.org/uniprot")) # Query UniProt for proteins with disease annotation SERVICE <http://sparql.uniprot.org/sparql> { ?uniprot up:annotation ?annotation . ?annotation a up:Disease_Annotation ; rdfs:comment ?comment . }} LIMIT 10 6. Chembl (https://www.ebi.ac.uk/rdf/services/chembl/) and UniProt, retrieve uniprot protein to makes sense of ChEMBL chemical assay results. PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> PREFIX up:<http://purl.uniprot.org/core/> PREFIX chembl_molecule: <http://rdf.ebi.ac.uk/resource/chembl/molecule/> SELECT ?activity ?assay ?target ?targetcmpt ?uniprot ?fullName WHERE { ?activity a cco:Activity ; cco:hasMolecule chembl_molecule:CHEMBL941 ; cco:hasAssay ?assay . ?assay cco:hasTarget ?target . ?target cco:hasTargetComponent ?targetcmpt . ?targetcmpt cco:targetCmptXref ?uniprot . ?uniprot a cco:UniprotRef SERVICE <http://sparql.uniprot.org/sparql/>{ ?uniprot up:recommendedName ?name . ?name up:fullName ?fullName . } } 7. Identifiers.org, for quick IRI translation. A different kind of endpoint all together but really helps making federated queries work seamlessly in the LS field (auto translating IRI based on patterns) identifiers.org/services/sparql > On 19 Dec 2016, at 21:34, Muhammad Saleem <saleem.muhammd@gmail.com> wrote: > > Dear all, > > Do you have federated SPARQL queries you use today or have used in the past? > Please e-mail them to us (via saleem.muhammd@gmail.com), as text files or in any way you prefer, together with the URLs of the endpoints on which you’ve executed them. > > Motivated by FEASIBLE [1] SPARQL benchmark generation framework, we want to design a customizable (in terms of queries types, number of queries, number of datasets etc.) federated SPARQL benchmark out of real queries log/use-cases. To this end, we are currently collecting real federated SPARQL queries from different domains, applications, and endpoints for analysis. > > Your help in this endeavour would be highly appreciated. > We'll be more than happy to collaborate and/or provide you with the detailed results of the study if you contribute. > > Your federated queries will also be published as RDF as part of the extended LSQ: The Linked SPARQL Queries Dataset [2,3]. > > Best Regards, > > Muhammad Saleem > Ruben Verborgh > Claus Stadler > Miel Vander Sande > Axel-Cyrille Ngonga Ngomo > Carlos Buil Aranda > > [1] https://svn.aksw.org/papers/2015/ISWC_FEASIBLE/public.pdf > [2] http://svn.aksw.org/papers/2015/ISWC_LSQ/public.pdf > [3] http://aksw.github.io/LSQ/ > >
Received on Monday, 19 December 2016 21:48:58 UTC