- From: Ian Wilson <Ian.Wilson@uchsc.edu>
- Date: Wed, 08 Feb 2006 03:23:20 -0700
- To: Susie Stephens <susie.stephens@oracle.com>
- CC: Sean Martin <sjmm@us.ibm.com>, public-semweb-lifesci@w3.org
Hi Susie, Thanks for the discussion regarding Oracle's RDF support. One of the topics of conversation at the recent F2F was the desire to use real world data sets, as opposed to the LUBM graphs, when performing benchmarks. I know others mentioned this also, but I specifically recall my conversations with Sean Martin. In Oracle's recent VLDB paper [1] the authors mention that a subset of Uniprot, consisting of 80 million triples, was used for benchmarking purposes. I was not able to find a pointer to this data though. Is this graph available for download? Since the paper also proposes several queries for the Uniprot graph, I think it would make sense to expand on this work for future benchmarks. Depending on how the Uniprot subgraph is derived, a lot of variability can be introduced into results (e.g. minimizing literals, etc.). The Uniprot RDF is also updated as the Uniprot database changes, so it is a moving target. We will thus want to maintain a local copy of this extract (on the wiki?) so changes in the graph don't change the benchmarking results. I think the entire Uniprot graph is probably not practical for most - that is, thus far, I have been unsuccessful in loading the entire graph of just the main file (~300 mil triples). I am currently using my own extract of the Uniprot data, ~125 million triples, to benchmark several triplestores in main memory - but I would rather share one common extract for benchmarking purposes in our community. Since your group has already published on the capabilities of the 10g product, this seemed a logical starting point. Curious what others think. Thanks, Ian [1]http://www.oracle.com/technology/tech/semantic_technologies/pdf/vldb_2005.pdf
Received on Wednesday, 8 February 2006 10:24:51 UTC