Re: Oracle Uniprot RDF data set and benchmarks from Eric Jain on 2006-02-09 (public-semweb-lifesci@w3.org from February 2006)

From: Eric Jain <Eric.Jain@isb-sib.ch>
Date: Thu, 09 Feb 2006 01:27:31 +0100
To: public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-ID: <43EA8C73.5040100@isb-sib.ch>

Jim Hendler wrote:
> I love this idea, but I would go a bit further - be even nicer for us 
> non-biologists if it also included some example queries to run (and 
> maybe even the correct answer sets) - I think if that existed, we could 
> push some of the triple store developers to use it as a benchmark, which 
> would help both communities...

In addition to the queries mentioned in the Oracle paper, the following 
queries may be interesting to test some of the more advanced capabilities, 
if present:

1. Filter some query by taxonomic kingdom, e.g. "Bacteria". This tests how 
efficiently inference is handled, Taxa are referenced through 
rdfs:subClassOf, and only the most specific Taxon is directly referenced 
from a Protein.

2. Group-by query to determine the most frequent keyword or GO term 
(referenced through classifiedAs) in a set of Proteins.

3. Output a mapping from one database to another via UniProt, e.g. EMBL to 
MIM. Both are referenced from UniProt through rdfs:seeAlso, a "database" 
property indicates the database name in which the resource is located. This 
query isn't very complicated, but could produce a large result set.

Received on Thursday, 9 February 2006 00:31:14 UTC