- From: Adrian Paschke <adrian.paschke@gmx.de>
- Date: Fri, 13 Feb 2009 13:28:06 +0100
- To: "'Melanie Courtot'" <mcourtot@gmail.com>, "'Nigam Shah'" <nigam@stanford.edu>
- Cc: "'Kei Cheung'" <kei.cheung@yale.edu>, "'Andrea Splendiani'" <andrea.splendiani@bbsrc.ac.uk>, "'public-semweb-lifesci hcls'" <public-semweb-lifesci@w3.org>
- Message-ID: <00fb01c98dd6$8641b260$92c51720$@paschke@gmx.de>
Hi Andrea, We are hosting parts of the W3C HCLS KB on an AllegroGraph triple store here in Berlin: http://www.corporate-semantic-web.de/hcls.html Beside the Berlin SPARQL benchmarks you might take a look at the LUMB benchmarks for AllegroGraph: http://agraph.franz.com/allegrograph/agraph_bench_lubm50.lhtml Cheers, Adrian Von: public-semweb-lifesci-request@w3.org [mailto:public-semweb-lifesci-request@w3.org] Im Auftrag von Melanie Courtot Gesendet: Donnerstag, 12. Februar 2009 19:58 An: Nigam Shah Cc: Kei Cheung; Andrea Splendiani; public-semweb-lifesci hcls Betreff: Re: Is there a benchmark of triple-stores with a "bias" to Life Sciences ? However, I'm not aware of a review/benchmark of these systems, both regarding performances and features. I've seen a few links like: http://esw.w3.org/topic/LargeTripleStores or http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.ht ml But I would like to know how these systems scale with large knowledge-base (load/query). Here is a rather old one: http://simile.mit.edu/reports/stores/stores.pdf. A more current report is: http://www.springerlink.com/content/m14k476lr726x1g2/ NCBO had done some benchmarks on use of multiple triple stores for storing ontologies in BioPortal. The report is not ready for prime time yet, but can share with specific people who are okay with reading an in-progress document. I'd be interested in this as well. In our case we are considering options on how to deal with Flow Cytometry data files and their associated metadata, and it would be helpful to have some background information on the different systems. Thanks, Melanie I wold also like to get some rough intuition on how much it makes sense to store data such as sequences and microarray values in them, and how sparql is usable to query based on these values. Is there anyone that can provide me with some good pointers ? Or is this some area that you think needs more exploration ? To me, semantic web/ontology has the potential to help facilitate meta-analysis of microarray data by helping researchers to identify comparable datasets if the metadata describing the samples/experiments are richly captured. Using semantic web to represent large tables of measurement values might be an overkill. Also, it's difficult to compete with all the commercial and public tools that have already existed for large-scale microarray data querying and analysis. Just my personal 2 cents. Absolutely. Storing large matrices in triplestores doesnt make much sense. Processing the metdata describing samples/experiment and then storing the semantically tagged metadata -- for identifying datasets of interest (e.g. all skin tumor datasets treated with a certain kind of agent) -- is the way to go. NCBO is actively pursuing this in our OBR (Open Biomedical Resources) work (http://www.bioontology.org/tools/obr.html). Right now we do not store the resulting semantically tagged metadata in triplestores for performance reasons but plan to do so in the future (hence the benchmark study mentioned above). Regards, Nigam.
Received on Friday, 13 February 2009 12:28:50 UTC