Re: [BioRDF] Scalability from Ora Lassila on 2006-04-04 (public-semweb-lifesci@w3.org from April 2006)

From: Ora Lassila <ora.lassila@nokia.com>
Date: Tue, 04 Apr 2006 15:07:56 -0400
To: "Cutler, Roger (RogerCutler)" <RogerCutler@chevron.com>
CC: <public-semweb-lifesci@w3.org>
Message-ID: <C0583C4C.18080%ora.lassila@nokia.com>

Roger,

I guess I am not similarly worried about data size -- anymore, as some RDF
folks may remember that I was championing a very different syntax for RDF in
the early days of Semantic Web work. As for your suggestions, I have the
following comments:

1 - Doable, for sure. We have built applications and systems that do this.

2 - This -- IMHO -- is misplaced optimization; that is, let's use on-the-fly
compression/decompression instead, because these types of techniques
typically yield much better results and work across all data, not just by
compressing some aspect of the data.

3 - I don't see how this actually alleviates the problem. How do you then
fetch the "actual" data? Don't you run into the same problem then? Or are
you actually suggesting abandoning the use of RDF there?

Regards,

    - Ora

On 2006-04-04 12:34, "Cutler, Roger (RogerCutler)" <RogerCutler@chevron.com>
wrote:

> 
> Somewhere down near the bottom of the lengthy thread that started with a
> query about ontology editors, someone casually mentioned that 53 Mby of
> data that was "imported" -- from which I infer it was not binary,
> compressed data but in some sort of text format -- turned into over 800
> Mby of RDF.  Frankly, a factor of 15 in size, possibly from a format
> that is fairly large to start out with, worries me.  There have since
> been some comments that sound like people think that they are going to
> deal with this by generating RDF only on-the-fly, as needed.  It seems
> to me, given the networked nature of RDF, that this is likely to have
> its own problems.  None of the solutions of which I am aware that
> actually are in operation work this way, but I will freely admit that my
> experience level here is pretty low.
> 
> It seems to me that there are at least three ways that one might try to
> cope with this issue:
> 
> 1 - Generate the RDF on-the-fly (as I said, I'm personally dubious about
> this one).
> 
> 2 - Make the RDF smaller somehow (maybe by making the URI's shorter, a
> al tinyurl???)
> 
> 3 - Limit the amount of information that is actually put into RDF to
> some sort of descriptive metadata and keep pointers to the real data,
> which is in some other format.
> 
> I think that the third approach is what I have seen done, but I get the
> impression that people may not be thinking in this way in this group.
> 
> I've prefaced this [BioRDF] because there has already been some
> discussion of scalability in that context and I believe that this issue
> has recently been upgraded in the deliverables of this subgroup.
> 
> Incidentally, what happened to the BioRDF telcons on Monday?  I was on
> vacation for a while and when I came back it didn't seem to be there.
> 
>

Received on Tuesday, 4 April 2006 19:09:13 UTC