Re: [BioRDF] Scalability

On 4/4/06, Cutler, Roger (RogerCutler) <RogerCutler@chevron.com> wrote:

My feeling is that until there are scalability issues that can be
analysed, it's rather premature to try and solve them. Having said
that -

> 3 - Limit the amount of information that is actually put into RDF to
> some sort of descriptive metadata and keep pointers to the real data,
> which is in some other format.

A contract I'm currently working on involves a large amount of
geospatial data. Ideally this would all be represented in RDF, the
data structures being highly irregular. But before I joined the
project some feasibility studies had been done and it was decided
(with good reason) that this wasn't a realistic option at this point
in time because of the quantity of data. The overall strategy I
suppose is to exploit proven tech wherever possible, in the interests
of "just make it work". Incoming raw data will be handled as
(streamed) XML, with a cluster of relational DBs for storage.
RDF-everywhere  will be approximated by layering: raw data
(XML/relational); metadata (RDF(S)/OWL and XML Schema); meta-metadata
(RDF(S)/OWL).

As an aside this has thrown up some interesting problems relating to
validation at the cusp of XML and RDF/OWL. Essentially XML validation
doesn't cover enough; the notion of validation doesn't make much sense
in the context of RDF(S);  OWL consistency checking is pretty much a
non-starter for performance reason (and probably quite strange
modelling would be needed to get the requisite checks). Right now I'm
looking at a bit of a Frankenstein setup, rules are probably going to
figure highly. If anyone has pointers to related material I'd be
grateful.

Cheers,
Danny.

--

http://dannyayers.com

Received on Tuesday, 4 April 2006 20:13:39 UTC