Re: Streaming OWL Parsers from Jeremy Carroll on 2003-02-16 (www-webont-wg@w3.org from February 2003)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Sun, 16 Feb 2003 20:18:47 +0000
To: Sean Bechhofer <seanb@cs.man.ac.uk>
CC: WebOnt WG <www-webont-wg@w3.org>
Message-ID: <3E4FF227.90306@hpl.hp.com>

> 
> And if someone actually has implemented a streaming OWL parser, simply let
> me have it and then I'll shut up and go away :-)
> 

Well -
I have a streaming RDF/XML parser -
The standard mode of operation keeps the rdf:nodeID's in parser memory 
(which is not constant size) but for very large files it is possible to 
have the client keep these (e.g. on a disk).

I have done some experiments with the following design pattern:

RDF/XML => N-triples  (essentially O(n)  not proven)
(store N-triples file on disk)
N-triple => sort       O(nlog(n))

... further processing.

It is not completely clear what your underlying requirement is - you are 
goind to need to store the ontology somewhere - so this seems to be about 
how to turn RDF/XML into the abstract syntax form efficiently.

Sorted triples allow you to build most of the abstract constructs quite easily.

You say you don't want to use Jena RDB, but at some point an ontology will 
necessarily be too big to fit in memory and you have to use disk.
There are inevitably long distance interactions in understanding an 
ontology - and there are a few extra ones in making sense of RDF/XML or OWL 
  as RDF/XML.
Also Jena is still a long way from being fully engineered and optimized - 
in fact, the amount of optimization work done on Jena can be no more that 
days of effort - this means that you hit the point where your memory runs 
out sooner than you might guess was appropriate.

Jeremy

Received on Sunday, 16 February 2003 15:19:05 UTC