[BioRDF] Scalability

Somewhere down near the bottom of the lengthy thread that started with a
query about ontology editors, someone casually mentioned that 53 Mby of
data that was "imported" -- from which I infer it was not binary,
compressed data but in some sort of text format -- turned into over 800
Mby of RDF.  Frankly, a factor of 15 in size, possibly from a format
that is fairly large to start out with, worries me.  There have since
been some comments that sound like people think that they are going to
deal with this by generating RDF only on-the-fly, as needed.  It seems
to me, given the networked nature of RDF, that this is likely to have
its own problems.  None of the solutions of which I am aware that
actually are in operation work this way, but I will freely admit that my
experience level here is pretty low.

It seems to me that there are at least three ways that one might try to
cope with this issue:

1 - Generate the RDF on-the-fly (as I said, I'm personally dubious about
this one).

2 - Make the RDF smaller somehow (maybe by making the URI's shorter, a
al tinyurl???)

3 - Limit the amount of information that is actually put into RDF to
some sort of descriptive metadata and keep pointers to the real data,
which is in some other format.

I think that the third approach is what I have seen done, but I get the
impression that people may not be thinking in this way in this group.

I've prefaced this [BioRDF] because there has already been some
discussion of scalability in that context and I believe that this issue
has recently been upgraded in the deliverables of this subgroup.

Incidentally, what happened to the BioRDF telcons on Monday?  I was on
vacation for a while and when I came back it didn't seem to be there.

Received on Tuesday, 4 April 2006 16:34:49 UTC