Re: Oracle Uniprot RDF data set and benchmarks

Jim Hendler said the following on 2/8/2006 2:29 PM:
> I love this idea, but I would go a bit further - be even nicer for us 
> non-biologists if it also included some example queries to run (and 
> maybe even the correct answer sets) - I think if that existed, we could 
> push some of the triple store developers to use it as a benchmark, which 
> would help both communities...

Agreed. The Oracle paper provided an outline for 6 different 
queries - which is a good starting point. It would be ideal to 
incorporate all of this into a test harness though. Similar 
efforts are underway at the SIMILE project, that I have been 
loosely involved with through Vineet Sinha.

Another similar project, that I haven't seen mentioned before, 
but found useful, is here:

For anyone that has not read the Oracle paper, I copied their 
query table into an ASCII friendly format below:

Description | Pattern | Projection | Result | limit
Q1: Display the ranges of
transmembrane regions
6 triples
5 vars
3 vars
15000 rows

Q2: List proteins with
publications by authors
with matching names
5 triples
5 vars
1 LIKE pred.
3 vars
10 rows

Q3: Count the number of
times a publication by a
specific author is cited
3 triples
2 vars
0 vars
32 rows

Q4: List resources that
are related to proteins
annotated with a specific
3 triples
2 vars
1 var
3000 rows

Q5: List genes associated
with human diseases
7 triples
5 vars
3 vars
750 rows

Q6: List recently
modified entries
2 triples
2 vars
1 range pred.
2 vars
8000 rows

Q1 (the only actual query provided)
SELECT AVG(LENGTH(protein)), AVG(LENGTH(begin)),
    ‘(?p       rdf:type      up:Protein)
     (?p       up:annotation  ?a)
     (?a       rdf:type
     (?a       up:range      ?range)
     (?range    up:begin      ?begin)
     (?range    up:end        ?end)’
    RDFModels('UniProt'), NULL, NULL))
WHERE rownum <= 15000;


Received on Wednesday, 8 February 2006 22:59:51 UTC