Re: About computer-optimized RDF format. from Damian Steer on 2008-07-23 (semantic-web@w3.org from July 2008)

From: Damian Steer <pldms@mac.com>
Date: Wed, 23 Jul 2008 20:13:54 +0100
To: Olivier Rossel <olivier.rossel@gmail.com>
Cc: Semantic Web <semantic-web@w3.org>
Message-id: <73DCF958-EBE1-4270-8FE4-EB9F6CFE8402@mac.com>

On 23 Jul 2008, at 10:07, Olivier Rossel wrote:

>
> I was wondering how to improve the loading time of RDF files in
> semantic web frameworks.
> And then came a question: is RDF efficient to load?
> The obvious answer is no.

I'm not sure that is obvious, but go on...

> Making it readable for humans makes it definitely slower to load in  
> programs.

And I'm not convinced about that, either.

> So I came to another question:
> Is there a computer-optimized format for RDF?
> Something that would make it load much faster.

For small numbers of triples you may be right, but (as Bijan says)  
gzipped n-triples are probably adequate. Let us never mention binary  
xml on this list again :-)

For large numbers of triples, in my limited experience, the things  
that affect RDF load speed are:

The speed of your disk.
The size of your memory.
Building indexes.
Duplicate suppression (triple, node, whatever).
BNode handling.
IRI and datatype checks (if you do them).
Parsing.

Now parsing is a factor, but it's fairly minor compared with the basic  
business of storing the triples. Stores would probably get more  
benefit from simple processing instructions like 'this contains no  
dupes' and 'my bnode ids are globally unique'.

Damian

Received on Wednesday, 23 July 2008 19:20:43 UTC