Re: Oracle Uniprot RDF data set and benchmarks

Jim Hendler wrote:
> I love this idea, but I would go a bit further - be even nicer for us 
> non-biologists if it also included some example queries to run (and 
> maybe even the correct answer sets) - I think if that existed, we could 
> push some of the triple store developers to use it as a benchmark, which 
> would help both communities...

In addition to the queries mentioned in the Oracle paper, the following 
queries may be interesting to test some of the more advanced capabilities, 
if present:

1. Filter some query by taxonomic kingdom, e.g. "Bacteria". This tests how 
efficiently inference is handled, Taxa are referenced through 
rdfs:subClassOf, and only the most specific Taxon is directly referenced 
from a Protein.

2. Group-by query to determine the most frequent keyword or GO term 
(referenced through classifiedAs) in a set of Proteins.

3. Output a mapping from one database to another via UniProt, e.g. EMBL to 
MIM. Both are referenced from UniProt through rdfs:seeAlso, a "database" 
property indicates the database name in which the resource is located. This 
query isn't very complicated, but could produce a large result set.

Received on Thursday, 9 February 2006 00:31:14 UTC