Re: About computer-optimized RDF format.

> > For large numbers of triples, in my limited experience, the things  
> > that affect RDF load speed
> 
> Ooo, I got a bit side tracked by the parsing bit.
> 
> > are:
> >
> > The speed of your disk.
> > The size of your memory.
> > Building indexes.
> > Duplicate suppression (triple, node, whatever).
> > BNode handling.
> > IRI and datatype checks (if you do them).
> > Parsing.
> >
> > Now parsing is a factor, but it's fairly minor compared with the  
> > basic business of storing the triples.
> 
> Indeed.
> 
> > Stores would probably get more benefit from simple processing  
> > instructions like 'this contains no dupes' and 'my bnode ids are  
> > globally unique'.
> 
> SWI Prolog had, IIRC, a mode to dump its internal structures so you  
> would avoid all that overhead (kinda like an image in Smalltalk or  
> lisp). Obviously databases do this as well.
> 
> Hard to see that a common format would makea  *ton* of sense. I guess  
> you could suppress dups, reconcile bnodes, and a few other things.  
> Indexes? I don't think so. That seems entirely proprietary and  
> appropriately so.

I can imagine a demand for an RDF exchange format that is actually a
position-independent/architecture-independent memory image of an indexed
quad store.  The sender could include the indexes it thinks will be
useful; the receiver could drop/regenerate indexes as needed.

This would make sense for the fairly-rare applications where
network/memory speed outstrip CPU speed -- where parsing time (and such)
are the real bottleneck.  From time to time I read that the bandwidth
improvement curve is steeper than the CPU improvement curve, so we'll
all be there eventually.  I'm not sure I believe it.    If we are, the
demand for this kind of RDF format will grow.   For now, I don't see
much demand.

     -- Sandro

Received on Wednesday, 23 July 2008 22:35:40 UTC