Re: Is there a benchmark of triple-stores with a "bias" to Life Sciences ? from Melanie Courtot on 2009-02-12 (public-semweb-lifesci@w3.org from February 2009)

From: Melanie Courtot <mcourtot@gmail.com>
Date: Thu, 12 Feb 2009 10:58:20 -0800
To: Nigam Shah <nigam@stanford.edu>
Cc: Kei Cheung <kei.cheung@yale.edu>, Andrea Splendiani <andrea.splendiani@bbsrc.ac.uk>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
Message-Id: <DC78BF5E-2662-4C12-9767-817FCE6F331B@gmail.com>

> However, I'm not aware of a review/benchmark of these systems, both  
> regarding performances and features.
> I've seen a few links like:
>
> http://esw.w3.org/topic/LargeTripleStores
>
> or
>
> http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html
>
> But I would like to know how these systems scale with large  
> knowledge-base (load/query).
> Here is a rather old one: http://simile.mit.edu/reports/stores/stores.pdf 
> . A more current report is: http://www.springerlink.com/content/m14k476lr726x1g2/
>
>
> NCBO had done some benchmarks on use of multiple triple stores for  
> storing ontologies in BioPortal. The report is not ready for prime  
> time yet, but can share with specific people who are okay with  
> reading an in-progress document.

I'd be interested in this as well. In our case we are considering  
options on how to deal with Flow Cytometry data files and their  
associated metadata, and it would be helpful to have some background  
information on the different systems.

Thanks,
Melanie


>
>
> I wold also like to get some rough intuition on how much it makes  
> sense to store data such as sequences and microarray values in them,  
> and how sparql is usable to query based on these values.
>
> Is there anyone that can provide me with some good pointers ?
>
> Or is this some area that you think needs more exploration ?
> To me, semantic web/ontology has the potential to help facilitate  
> meta-analysis of microarray data by helping researchers to identify  
> comparable datasets if the metadata describing the samples/ 
> experiments are richly captured. Using semantic web to represent  
> large tables of measurement values might be an overkill. Also, it's  
> difficult to compete with all the commercial and public tools that  
> have already existed for large-scale microarray data querying and  
> analysis. Just my personal 2 cents.
>
> Absolutely. Storing large matrices in triplestores doesnt make much  
> sense. Processing the metdata describing samples/experiment and then  
> storing the semantically tagged metadata -- for identifying datasets  
> of interest (e.g. all skin tumor datasets treated with a certain  
> kind of agent) -- is the way to go. NCBO is actively pursuing this  
> in our OBR (Open Biomedical Resources) work (http://www.bioontology.org/tools/obr.html 
> ). Right now we do not store the resulting semantically tagged  
> metadata in triplestores for performance reasons but plan to do so  
> in the future (hence the benchmark study mentioned above).
>
> Regards,
> Nigam.

Received on Thursday, 12 February 2009 22:43:32 UTC