W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > April 2006

Re: [BioRDF] Scalability

From: Danny Ayers <danny.ayers@gmail.com>
Date: Tue, 4 Apr 2006 22:13:30 +0200
Message-ID: <1f2ed5cd0604041313u360b71f0m9a1a46936a55e3cf@mail.gmail.com>
To: "Cutler, Roger (RogerCutler)" <RogerCutler@chevron.com>
Cc: public-semweb-lifesci@w3.org

On 4/4/06, Cutler, Roger (RogerCutler) <RogerCutler@chevron.com> wrote:

My feeling is that until there are scalability issues that can be
analysed, it's rather premature to try and solve them. Having said
that -

> 3 - Limit the amount of information that is actually put into RDF to
> some sort of descriptive metadata and keep pointers to the real data,
> which is in some other format.

A contract I'm currently working on involves a large amount of
geospatial data. Ideally this would all be represented in RDF, the
data structures being highly irregular. But before I joined the
project some feasibility studies had been done and it was decided
(with good reason) that this wasn't a realistic option at this point
in time because of the quantity of data. The overall strategy I
suppose is to exploit proven tech wherever possible, in the interests
of "just make it work". Incoming raw data will be handled as
(streamed) XML, with a cluster of relational DBs for storage.
RDF-everywhere  will be approximated by layering: raw data
(XML/relational); metadata (RDF(S)/OWL and XML Schema); meta-metadata

As an aside this has thrown up some interesting problems relating to
validation at the cusp of XML and RDF/OWL. Essentially XML validation
doesn't cover enough; the notion of validation doesn't make much sense
in the context of RDF(S);  OWL consistency checking is pretty much a
non-starter for performance reason (and probably quite strange
modelling would be needed to get the requisite checks). Right now I'm
looking at a bit of a Frankenstein setup, rules are probably going to
figure highly. If anyone has pointers to related material I'd be



Received on Tuesday, 4 April 2006 20:13:39 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:52:25 UTC