W3C home > Mailing lists > Public > semantic-web@w3.org > July 2008

Re: About computer-optimized RDF format.

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Wed, 23 Jul 2008 21:06:34 +0100
Message-Id: <48955A34-559B-45AC-920F-5DAEA946895A@cs.man.ac.uk>
Cc: Olivier Rossel <olivier.rossel@gmail.com>, Semantic Web <semantic-web@w3.org>
To: Damian Steer <pldms@mac.com>

Hi Damain.

On 23 Jul 2008, at 20:13, Damian Steer wrote:
[snip]
> For large numbers of triples, in my limited experience, the things  
> that affect RDF load speed

Ooo, I got a bit side tracked by the parsing bit.

> are:
>
> The speed of your disk.
> The size of your memory.
> Building indexes.
> Duplicate suppression (triple, node, whatever).
> BNode handling.
> IRI and datatype checks (if you do them).
> Parsing.
>
> Now parsing is a factor, but it's fairly minor compared with the  
> basic business of storing the triples.

Indeed.

> Stores would probably get more benefit from simple processing  
> instructions like 'this contains no dupes' and 'my bnode ids are  
> globally unique'.

SWI Prolog had, IIRC, a mode to dump its internal structures so you  
would avoid all that overhead (kinda like an image in Smalltalk or  
lisp). Obviously databases do this as well.

Hard to see that a common format would makea  *ton* of sense. I guess  
you could suppress dups, reconcile bnodes, and a few other things.  
Indexes? I don't think so. That seems entirely proprietary and  
appropriately so.

Cheers,
Bijan "Binary XML 4 Ever!" Parsia.
Received on Wednesday, 23 July 2008 20:04:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:45:29 GMT