Re: About computer-optimized RDF format. from Bijan Parsia on 2008-07-23 (semantic-web@w3.org from July 2008)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Wed, 23 Jul 2008 21:06:34 +0100
To: Damian Steer <pldms@mac.com>
Cc: Olivier Rossel <olivier.rossel@gmail.com>, Semantic Web <semantic-web@w3.org>
Message-Id: <48955A34-559B-45AC-920F-5DAEA946895A@cs.man.ac.uk>

Hi Damain.

On 23 Jul 2008, at 20:13, Damian Steer wrote:
[snip]
> For large numbers of triples, in my limited experience, the things  
> that affect RDF load speed

Ooo, I got a bit side tracked by the parsing bit.

> are:
>
> The speed of your disk.
> The size of your memory.
> Building indexes.
> Duplicate suppression (triple, node, whatever).
> BNode handling.
> IRI and datatype checks (if you do them).
> Parsing.
>
> Now parsing is a factor, but it's fairly minor compared with the  
> basic business of storing the triples.

Indeed.

> Stores would probably get more benefit from simple processing  
> instructions like 'this contains no dupes' and 'my bnode ids are  
> globally unique'.

SWI Prolog had, IIRC, a mode to dump its internal structures so you  
would avoid all that overhead (kinda like an image in Smalltalk or  
lisp). Obviously databases do this as well.

Hard to see that a common format would makea  *ton* of sense. I guess  
you could suppress dups, reconcile bnodes, and a few other things.  
Indexes? I don't think so. That seems entirely proprietary and  
appropriately so.

Cheers,
Bijan "Binary XML 4 Ever!" Parsia.

Received on Wednesday, 23 July 2008 20:04:17 UTC