Re: Is there a benchmark of triple-stores with a "bias" to Life Sciences ? from Nigam Shah on 2009-02-12 (public-semweb-lifesci@w3.org from February 2009)

From: Nigam Shah <nigam@stanford.edu>
Date: Thu, 12 Feb 2009 10:34:00 -0800
To: Kei Cheung <kei.cheung@yale.edu>
Cc: Andrea Splendiani <andrea.splendiani@bbsrc.ac.uk>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
Message-ID: <352cf0db0902121034rfb01151g7d90d65d647d4f7a@mail.gmail.com>

>
> However, I'm not aware of a review/benchmark of these systems, both
>> regarding performances and features.
>> I've seen a few links like:
>>
>> http://esw.w3.org/topic/LargeTripleStores
>>
>> or
>>
>>
>> http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html
>>
>> But I would like to know how these systems scale with large knowledge-base
>> (load/query).
>>
> Here is a rather old one: http://simile.mit.edu/reports/stores/stores.pdf.
> A more current report is:
> http://www.springerlink.com/content/m14k476lr726x1g2/
>


NCBO had done some benchmarks on use of multiple triple stores for storing
ontologies in BioPortal. The report is not ready for prime time yet, but can
share with specific people who are okay with reading an in-progress
document.


>
>
>
>> I wold also like to get some rough intuition on how much it makes sense to
>> store data such as sequences and microarray values in them, and how sparql
>> is usable to query based on these values.
>>
>> Is there anyone that can provide me with some good pointers ?
>>
>> Or is this some area that you think needs more exploration ?
>>
> To me, semantic web/ontology has the potential to help facilitate
> meta-analysis of microarray data by helping researchers to identify
> comparable datasets if the metadata describing the samples/experiments are
> richly captured. Using semantic web to represent large tables of measurement
> values might be an overkill. Also, it's difficult to compete with all the
> commercial and public tools that have already existed for large-scale
> microarray data querying and analysis. Just my personal 2 cents.
>

Absolutely. Storing large matrices in triplestores doesnt make much sense.
Processing the metdata describing samples/experiment and then storing the
semantically tagged metadata -- for identifying datasets of interest (e.g.
all skin tumor datasets treated with a certain kind of agent) -- is the way
to go. NCBO is actively pursuing this in our OBR (Open Biomedical Resources)
work (http://www.bioontology.org/tools/obr.html). Right now we do not store
the resulting semantically tagged metadata in triplestores for performance
reasons but plan to do so in the future (hence the benchmark study mentioned
above).

Regards,
Nigam.

Received on Thursday, 12 February 2009 18:34:42 UTC