Streaming OWL Parsers from Sean Bechhofer on 2003-02-14 (www-webont-wg@w3.org from February 2003)

From: Sean Bechhofer <seanb@cs.man.ac.uk>
Date: Fri, 14 Feb 2003 16:49:21 +0000 (GMT Standard Time)
To: WebOnt WG <www-webont-wg@w3.org>
Message-ID: <Pine.WNT.4.44.0302141647340.1744-100000@potato>

In trying to build some "OWL Implementations", a number of issues are
coming to light, not least with how one deals with XML-RDF encodings of
OWL.

We've been trying to build a streaming parser, but this is proving
difficult with OWL represented as RDF triples. Given an OWL-RDF ontology,
the various triples that make up any expression or assertion could be
scattered liberally throughout the document, so even if I try and build
things in a streaming fashion, there's a load of stuff that I'm going to
have to cache or remember or make assumptions about and clean up later.

If we're talking about the kinds of example model that we've seen in
things like the test sets, this is, of course, not an issue. Just build
the data structure. Big deal. But what happens when I've got an ontology
with 10^8 concepts/individuals in it and I want to do some simple
processing on it, that doesn't necessarily warrant me building the whole
data structure? Perhaps I want to find all the assertions of a certain
form. With a more rigorously structured grammar (like an XML schema or
lisp like expressions), things would be easier, but when presented with a
collection of triples, if I'm very unlucky, I'll end up having to
construct some structure in order to get what I want out of it.

<flippant>
An analogy that springs to mind is that it's like building a model of the
eiffel tower out of matchsticks. Except that what's happened is that
someone's done it already, labelled each pair of touching ends of each
matchstick with a unique number, dismantled it, and then given it to you
in a box saying "what do you think that is then?".
</flippant>

Ok, perhaps an over-exaggeration, but it's what it feels like sometimes.

Of course, I'm not claiming that it's *impossible* to do this, it just
seems a lot of work, and requires some unpleasant and messy engineering,
when the presence of further constraints would make things a lot easier.
This is not intended as a simple "bash-RDF" polemic -- I'd be interested
to hear any positive experiences of trying to handle similar
languages/grammars expressed as RDF schema. Is there any? Is it possible
to guarantee that you can do this without having to load the whole thing
into memory at some point?

And if someone actually has implemented a streaming OWL parser, simply let
me have it and then I'll shut up and go away :-)

	Sean

-- 
Sean Bechhofer
seanb@cs.man.ac.uk
http://www.cs.man.ac.uk/~seanb

Received on Friday, 14 February 2003 11:49:59 UTC