AW: Is there a benchmark of triple-stores with a "bias" to Life Sciences ? from Adrian Paschke on 2009-02-13 (public-semweb-lifesci@w3.org from February 2009)

From: Adrian Paschke <adrian.paschke@gmx.de>
Date: Fri, 13 Feb 2009 13:28:06 +0100
To: "'Melanie Courtot'" <mcourtot@gmail.com>, "'Nigam Shah'" <nigam@stanford.edu>
Cc: "'Kei Cheung'" <kei.cheung@yale.edu>, "'Andrea Splendiani'" <andrea.splendiani@bbsrc.ac.uk>, "'public-semweb-lifesci hcls'" <public-semweb-lifesci@w3.org>
Message-ID: <00fb01c98dd6$8641b260$92c51720$@paschke@gmx.de>

Hi Andrea,

 

We are hosting parts of the W3C HCLS KB on an AllegroGraph triple store here
in Berlin:

 

http://www.corporate-semantic-web.de/hcls.html

 

Beside the Berlin SPARQL benchmarks you might take a look at the LUMB
benchmarks for AllegroGraph:

 

http://agraph.franz.com/allegrograph/agraph_bench_lubm50.lhtml

 

 

Cheers, Adrian

 

Von: public-semweb-lifesci-request@w3.org
[mailto:public-semweb-lifesci-request@w3.org] Im Auftrag von Melanie Courtot
Gesendet: Donnerstag, 12. Februar 2009 19:58
An: Nigam Shah
Cc: Kei Cheung; Andrea Splendiani; public-semweb-lifesci hcls
Betreff: Re: Is there a benchmark of triple-stores with a "bias" to Life
Sciences ?

 





However, I'm not aware of a review/benchmark of these systems, both
regarding performances and features.
I've seen a few links like:

http://esw.w3.org/topic/LargeTripleStores

or

http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.ht
ml 

But I would like to know how these systems scale with large knowledge-base
(load/query).

Here is a rather old one: http://simile.mit.edu/reports/stores/stores.pdf. A
more current report is:
http://www.springerlink.com/content/m14k476lr726x1g2/



NCBO had done some benchmarks on use of multiple triple stores for storing
ontologies in BioPortal. The report is not ready for prime time yet, but can
share with specific people who are okay with reading an in-progress
document.

 

I'd be interested in this as well. In our case we are considering options on
how to deal with Flow Cytometry data files and their associated metadata,
and it would be helpful to have some background information on the different
systems.

 

Thanks,

Melanie

 





 


I wold also like to get some rough intuition on how much it makes sense to
store data such as sequences and microarray values in them, and how sparql
is usable to query based on these values.

Is there anyone that can provide me with some good pointers ?

Or is this some area that you think needs more exploration ?

To me, semantic web/ontology has the potential to help facilitate
meta-analysis of microarray data by helping researchers to identify
comparable datasets if the metadata describing the samples/experiments are
richly captured. Using semantic web to represent large tables of measurement
values might be an overkill. Also, it's difficult to compete with all the
commercial and public tools that have already existed for large-scale
microarray data querying and analysis. Just my personal 2 cents.


Absolutely. Storing large matrices in triplestores doesnt make much sense.
Processing the metdata describing samples/experiment and then storing the
semantically tagged metadata -- for identifying datasets of interest (e.g.
all skin tumor datasets treated with a certain kind of agent) -- is the way
to go. NCBO is actively pursuing this in our OBR (Open Biomedical Resources)
work (http://www.bioontology.org/tools/obr.html). Right now we do not store
the resulting semantically tagged metadata in triplestores for performance
reasons but plan to do so in the future (hence the benchmark study mentioned
above).

Regards,
Nigam.

Received on Friday, 13 February 2009 12:28:50 UTC