- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 23 Jul 2008 18:35:05 -0400
- To: Bijan Parsia <bparsia@cs.man.ac.uk>
- Cc: Damian Steer <pldms@mac.com>, Olivier Rossel <olivier.rossel@gmail.com>, Semantic Web <semantic-web@w3.org>
> > For large numbers of triples, in my limited experience, the things
> > that affect RDF load speed
>
> Ooo, I got a bit side tracked by the parsing bit.
>
> > are:
> >
> > The speed of your disk.
> > The size of your memory.
> > Building indexes.
> > Duplicate suppression (triple, node, whatever).
> > BNode handling.
> > IRI and datatype checks (if you do them).
> > Parsing.
> >
> > Now parsing is a factor, but it's fairly minor compared with the
> > basic business of storing the triples.
>
> Indeed.
>
> > Stores would probably get more benefit from simple processing
> > instructions like 'this contains no dupes' and 'my bnode ids are
> > globally unique'.
>
> SWI Prolog had, IIRC, a mode to dump its internal structures so you
> would avoid all that overhead (kinda like an image in Smalltalk or
> lisp). Obviously databases do this as well.
>
> Hard to see that a common format would makea *ton* of sense. I guess
> you could suppress dups, reconcile bnodes, and a few other things.
> Indexes? I don't think so. That seems entirely proprietary and
> appropriately so.
I can imagine a demand for an RDF exchange format that is actually a
position-independent/architecture-independent memory image of an indexed
quad store. The sender could include the indexes it thinks will be
useful; the receiver could drop/regenerate indexes as needed.
This would make sense for the fairly-rare applications where
network/memory speed outstrip CPU speed -- where parsing time (and such)
are the real bottleneck. From time to time I read that the bandwidth
improvement curve is steeper than the CPU improvement curve, so we'll
all be there eventually. I'm not sure I believe it. If we are, the
demand for this kind of RDF format will grow. For now, I don't see
much demand.
-- Sandro
Received on Wednesday, 23 July 2008 22:35:40 UTC