- From: Sandro Hawke <sandro@w3.org>
- Date: Wed, 23 Jul 2008 18:35:05 -0400
- To: Bijan Parsia <bparsia@cs.man.ac.uk>
- Cc: Damian Steer <pldms@mac.com>, Olivier Rossel <olivier.rossel@gmail.com>, Semantic Web <semantic-web@w3.org>
> > For large numbers of triples, in my limited experience, the things > > that affect RDF load speed > > Ooo, I got a bit side tracked by the parsing bit. > > > are: > > > > The speed of your disk. > > The size of your memory. > > Building indexes. > > Duplicate suppression (triple, node, whatever). > > BNode handling. > > IRI and datatype checks (if you do them). > > Parsing. > > > > Now parsing is a factor, but it's fairly minor compared with the > > basic business of storing the triples. > > Indeed. > > > Stores would probably get more benefit from simple processing > > instructions like 'this contains no dupes' and 'my bnode ids are > > globally unique'. > > SWI Prolog had, IIRC, a mode to dump its internal structures so you > would avoid all that overhead (kinda like an image in Smalltalk or > lisp). Obviously databases do this as well. > > Hard to see that a common format would makea *ton* of sense. I guess > you could suppress dups, reconcile bnodes, and a few other things. > Indexes? I don't think so. That seems entirely proprietary and > appropriately so. I can imagine a demand for an RDF exchange format that is actually a position-independent/architecture-independent memory image of an indexed quad store. The sender could include the indexes it thinks will be useful; the receiver could drop/regenerate indexes as needed. This would make sense for the fairly-rare applications where network/memory speed outstrip CPU speed -- where parsing time (and such) are the real bottleneck. From time to time I read that the bandwidth improvement curve is steeper than the CPU improvement curve, so we'll all be there eventually. I'm not sure I believe it. If we are, the demand for this kind of RDF format will grow. For now, I don't see much demand. -- Sandro
Received on Wednesday, 23 July 2008 22:35:40 UTC