Re: Triples storage from Emanuele D'Arrigo on 2007-09-26 (public-owl-dev@w3.org from July to September 2007)

From: Emanuele D'Arrigo <manu3d@gmail.com>
Date: Wed, 26 Sep 2007 12:33:22 +0100
To: "Semantic Web Interest Group" <semantic-web@w3.org>, "public-owl-dev@w3.org" <public-owl-dev@w3.org>
Message-ID: <915dc91d0709260433m1ff228c5jbdda5e974d1dee60@mail.gmail.com>

On 9/26/07, Renato Golin <renato@ebi.ac.uk> wrote:
> I'm quite interested in triplet storages but what I found is that there
> is no consensus nor standard for anything in that area.

In hindsight, I believe my question was a bit hopeless: as Arjohn
says it's unlikely that a consensus will ever develop on the matter.
There will be various options with pros and cons. My bad. =)

I guess my question really had to be more in the direction of:
are there two or three architectures for triplets storage to choose
from?

> There are several storage engines but each one doing it's own way.
> Also, the support to query languages is quite random.

Indeed I seem to have noticed that. Is the field still so novel
that there are no full implementation of RQL?
Yet, the specifications seems to have been stable for some time... =?

> Given the amount of data you can have the hash table might not fit in
> any computer and even if it fits, I/O will become a huge problem.
> This is a common misconception that hash tables are always faster than
> lists but that's not true, especially when you have bigger hash tables
> than your memory can hold (not that difficult).

Well, I didn't even for a second consider the possibility of holding all
the data in memory. Whatever the storage architecture is I'm assuming
the data is in a database a-la MySQL, even if it's just a long list of
triplets.

> The only way to have an efficient and still powerful storage engine is
> to mix standards. For very local queries, hashes can be a good solution.
> For locally distributed queries, lists and binary indexes might perform
> better. But for truly distributed queries (outside of your domain) you
> need an adaptive indexing system.
> The more distributed you go slower it is, but that's acceptable when you
> reckon the quality of your data will be higher that way.

What do you mean with local/distributed queries? The use I have in
mind is local to a company but geographically distributed because
the company has facilities in various continents.

> More alternatives than standards... see the Wiki pages to learn more:
> http://esw.w3.org/topic/FrontPage
> http://esw.w3.org/topic/SemanticWebTools
> http://esw.w3.org/topic/CommercialProducts
> http://esw.w3.org/topic/Semantic_Bioinformatics (storage at the end)

Thank you, I'll check them out!

Ciao!

Manu

Received on Wednesday, 26 September 2007 11:33:33 UTC