W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > February 2006

Re: Oracle Uniprot RDF data set and benchmarks

From: Ian Wilson <Ian.Wilson@uchsc.edu>
Date: Wed, 08 Feb 2006 15:58:21 -0700
Message-ID: <43EA778D.6070202@uchsc.edu>
To: Jim Hendler <hendler@cs.umd.edu>
CC: Susie Stephens <susie.stephens@oracle.com>, public-semweb-lifesci@w3.org

Jim Hendler said the following on 2/8/2006 2:29 PM:
> I love this idea, but I would go a bit further - be even nicer for us 
> non-biologists if it also included some example queries to run (and 
> maybe even the correct answer sets) - I think if that existed, we could 
> push some of the triple store developers to use it as a benchmark, which 
> would help both communities...
> 

Agreed. The Oracle paper provided an outline for 6 different 
queries - which is a good starting point. It would be ideal to 
incorporate all of this into a test harness though. Similar 
efforts are underway at the SIMILE project, that I have been 
loosely involved with through Vineet Sinha.

http://simile.mit.edu/repository/shootout/trunk/shootout/ 
http://simile.mit.edu/repository/shootout/trunk/shootout-core/

Another similar project, that I haven't seen mentioned before, 
but found useful, is here:
http://tripletest.sourceforge.net/

For anyone that has not read the Oracle paper, I copied their 
query table into an ASCII friendly format below:

Description | Pattern | Projection | Result | limit
---------------------------------------------------
Q1: Display the ranges of
transmembrane regions
6 triples
5 vars
3 vars
15000 rows

Q2: List proteins with
publications by authors
with matching names
5 triples
5 vars
1 LIKE pred.
3 vars
10 rows

Q3: Count the number of
times a publication by a
specific author is cited
3 triples
2 vars
0 vars
32 rows

Q4: List resources that
are related to proteins
annotated with a specific
keyword
3 triples
2 vars
1 var
3000 rows

Q5: List genes associated
with human diseases
7 triples
5 vars
3 vars
750 rows

Q6: List recently
modified entries
2 triples
2 vars
1 range pred.
2 vars
8000 rows

---------------------------------------
Q1 (the only actual query provided)
---------------------------------------
SELECT AVG(LENGTH(protein)), AVG(LENGTH(begin)),
        AVG(LENGTH(end))
FROM TABLE(RDF_MATCH(
    (?p       rdf:type      up:Protein)
     (?p       up:annotation  ?a)
     (?a       rdf:type
                up:Transmembrane_Annotation)
     (?a       up:range      ?range)
     (?range    up:begin      ?begin)
     (?range    up:end        ?end)
    RDFModels('UniProt'), NULL, NULL))
WHERE rownum <= 15000;


Ian
Received on Wednesday, 8 February 2006 22:59:51 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:00:42 GMT