- From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
- Date: Fri, 14 Feb 2003 12:25:56 -0500 (EST)
- To: seanb@cs.man.ac.uk
- Cc: www-webont-wg@w3.org
From: Sean Bechhofer <seanb@cs.man.ac.uk> Subject: Streaming OWL Parsers Date: Fri, 14 Feb 2003 16:49:21 +0000 (GMT Standard Time) > In trying to build some "OWL Implementations", a number of issues are > coming to light, not least with how one deals with XML-RDF encodings of > OWL. > > We've been trying to build a streaming parser, but this is proving > difficult with OWL represented as RDF triples. Given an OWL-RDF ontology, > the various triples that make up any expression or assertion could be > scattered liberally throughout the document, so even if I try and build > things in a streaming fashion, there's a load of stuff that I'm going to > have to cache or remember or make assumptions about and clean up later. > > If we're talking about the kinds of example model that we've seen in > things like the test sets, this is, of course, not an issue. Just build > the data structure. Big deal. But what happens when I've got an ontology > with 10^8 concepts/individuals in it and I want to do some simple > processing on it, that doesn't necessarily warrant me building the whole > data structure? Perhaps I want to find all the assertions of a certain > form. With a more rigorously structured grammar (like an XML schema or > lisp like expressions), things would be easier, but when presented with a > collection of triples, if I'm very unlucky, I'll end up having to > construct some structure in order to get what I want out of it. > > <flippant> > An analogy that springs to mind is that it's like building a model of the > eiffel tower out of matchsticks. Except that what's happened is that > someone's done it already, labelled each pair of touching ends of each > matchstick with a unique number, dismantled it, and then given it to you > in a box saying "what do you think that is then?". > </flippant> > > Ok, perhaps an over-exaggeration, but it's what it feels like sometimes. > > Of course, I'm not claiming that it's *impossible* to do this, it just > seems a lot of work, and requires some unpleasant and messy engineering, > when the presence of further constraints would make things a lot easier. > This is not intended as a simple "bash-RDF" polemic -- I'd be interested > to hear any positive experiences of trying to handle similar > languages/grammars expressed as RDF schema. Is there any? Is it possible > to guarantee that you can do this without having to load the whole thing > into memory at some point? I believe that it is in general impossible to write an n-pass streaming ``parser'' for many interesting RDF tasks. For example, suppose that you are given a sequence of RDF triples, and want to emit the RDFS-instances of a particular class. It might be the case that you have to determine the sub-classes of this class to an arbitrary depth, and these triples might be arranged in reverse order. Thus there is no n-pass constant-memory-size algorithm that can do this, for any fixed n. In OWL, you might simply want to extra an axiom from a sequence of RDF triples. This again cannot be done using an n-pass constant-memory-size algorithm because the nesting might be done in reverse order. > And if someone actually has implemented a streaming OWL parser, simply let > me have it and then I'll shut up and go away :-) I would be very surprised if one existed. > Sean peter
Received on Friday, 14 February 2003 13:07:47 UTC