Re: About computer-optimized RDF format. from Sandro Hawke on 2008-07-23 (semantic-web@w3.org from July 2008)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 23 Jul 2008 18:35:05 -0400
To: Bijan Parsia <bparsia@cs.man.ac.uk>
Cc: Damian Steer <pldms@mac.com>, Olivier Rossel <olivier.rossel@gmail.com>, Semantic Web <semantic-web@w3.org>
Message-ID: <20366.1216852505@ubuhebe>

> > For large numbers of triples, in my limited experience, the things  
> > that affect RDF load speed
> 
> Ooo, I got a bit side tracked by the parsing bit.
> 
> > are:
> >
> > The speed of your disk.
> > The size of your memory.
> > Building indexes.
> > Duplicate suppression (triple, node, whatever).
> > BNode handling.
> > IRI and datatype checks (if you do them).
> > Parsing.
> >
> > Now parsing is a factor, but it's fairly minor compared with the  
> > basic business of storing the triples.
> 
> Indeed.
> 
> > Stores would probably get more benefit from simple processing  
> > instructions like 'this contains no dupes' and 'my bnode ids are  
> > globally unique'.
> 
> SWI Prolog had, IIRC, a mode to dump its internal structures so you  
> would avoid all that overhead (kinda like an image in Smalltalk or  
> lisp). Obviously databases do this as well.
> 
> Hard to see that a common format would makea  *ton* of sense. I guess  
> you could suppress dups, reconcile bnodes, and a few other things.  
> Indexes? I don't think so. That seems entirely proprietary and  
> appropriately so.

I can imagine a demand for an RDF exchange format that is actually a
position-independent/architecture-independent memory image of an indexed
quad store.  The sender could include the indexes it thinks will be
useful; the receiver could drop/regenerate indexes as needed.

This would make sense for the fairly-rare applications where
network/memory speed outstrip CPU speed -- where parsing time (and such)
are the real bottleneck.  From time to time I read that the bandwidth
improvement curve is steeper than the CPU improvement curve, so we'll
all be there eventually.  I'm not sure I believe it.    If we are, the
demand for this kind of RDF format will grow.   For now, I don't see
much demand.

     -- Sandro

Received on Wednesday, 23 July 2008 22:35:40 UTC