Re: RDF as a syntax for OWL (was Re: same-syntax extensions to RDF) from Bijan Parsia on 2005-01-06 (www-rdf-logic@w3.org from January 2005)

From: Bijan Parsia <bparsia@isr.umd.edu>
Date: Fri, 7 Jan 2005 04:06:57 +0900
To: "Geoff Chappell" <geoff@sover.net>
Cc: <www-rdf-logic@w3.org>
Message-Id: <22D72F47-6016-11D9-97A5-000D93C1F7A6@isr.umd.edu>
Bravo! Good for you for taking on the challenge!

On Jan 6, 2005, at 12:07 PM, Geoff Chappell wrote:
[snip]
> Given the challenge, I had to give it a try in RDF Gateway's rule  
> language
> ;-) The results don't really rebut the ugliness claim, but do  
> demonstrate
> that it's doable in at least one of the available frameworks.

FWIW, I would never claim and haven't that it *couldn't* be done. It  
has been done before, and it's pretty clear how to do (as you  
demonstrate).

> BTW, I'm not
> denying this was a bit of a pain, nor in any way trying to be an  
> advocate
> for the forcing of fol into rdf syntax.

Granted.

> I ended up with rulebase below. With it I could convert a graph to nnf  
> form
> with a few lines - e.g:
>
>   var ds = new  
> datasource("inet?parsetype=auto&url=c:/kill/nnftest.rdf");
>
>   select ?p ?s ?o using #ds rulebase nnf where {[rdf:type] ?c  
> [owl:Class]}
> and nnf(?c ?p ?s ?o);
>
> The rules may not be 100% though I tested them with a decent number of  
> cases
> (but no pathological ones).

Sorry about the missing bits of the specification. It *was* 4 am or so  
:)

> rulebase nnf
> {
> 	infer nnf(?cin, ?p, ?s, ?o) from nnf_pos(?cin, ?cout, ?p, ?s, ?o);
>
> 	infer nnf_pos(?cin, ?cout, ?p, ?s, ?o) from isAnon(?cin)

Of course, and we might consider this an extension to the task: the  
expression could be named! Even if not named, consider the  
semantics...you can't smush bnodes that identify equivalent expression  
or expressions of exactly the same form. Every time you see one of  
these, there might be other triples floating around :(

Aliasing problems are much worse in RDF, I warrant.

Notice how the RDF representation imports your expressions into your  
*modeling* domain. It's not merely that you can *introspect* your  
expressions, they are actually there in your domain! Kinda scary.

(Sorry, that was a bit of an aside.)

[snip as things are evolving a bit faster than I'm writing :)]

Suppose we end up with an nnf fucntion in this style that's complete  
and correct. Let's consider some of the effects of some of the choices.

First, for analysis of the program. The comparison class would be a  
program that uses a term like syntax (as I've been calling it). For  
example,
	<not><not>http://foo.com/A</not></not>

Roughly the same as
	~~A where A is an atomic name.

So, in both cases we need an xml parser, say a sax one. In the term  
case, we can write the nnf directly on sax events (I'm pretty sure;  
don't see why not; maybe I'll do the exercise). So, our dependancies  
are done. We need nothing else. It'll also be pretty close to maximally  
efficient, depending on whether we use infix or pre/post fix, because  
we can avoid lookahead. It's also fairly trivial to add some simple  
syntactic validation along the way, by hand, if we'd like. A lot of the  
rdf legal but malformed constructions are just impossible in this  
syntax.

The syntax is usefully compositional. I can embed such expressions in  
larger forms and my corresponding code for dealing with those  
expressions could be (likely) called from other code. Imagine an XSLT  
stylesheet for nnf. If I extend the syntax to handle subClassOf (i.e.,  
limited conditionals), I should be able to transform the left and right  
hand sides using pretty much my old stylesheet. Writing test cases is  
easy as is checking them.

If I want to separate syntax checking from my transform, I could whip  
up a W3C XML Schema or relax-ng schema. It's going to be fairly simple  
in this case. I can then use those schemas in a range of editors to  
assist me in generating correct documents.

In the Using A Big RDF Toolkit Oriented Case, my dependancies are much  
worse. I don't just need an RDF parser, but, according to this line of  
thinking, I need a Big Beefy RDF Toolkit with Query. These kits aren't  
small, and they aren't as ubiquitous (I'd much rather write just a sax  
parser, than both a sax and an RDF parser...though it's not *that*  
terrible to write an RDF parser, it's just a waste in this case!) Ok,  
so I *have* to load the entire document into memory before I can do  
anything else! (The last triple might be relevant to the first triple!)  
(Or into a disk based database, but I trust the obvious worseness of  
that is sufficient to rebut it). I either have to have indexed it a  
lot, or I must touch lots of triples with *each* query. If I want to be  
careful, I probably want to delete them as I consume them, and let's  
say I'll serialize or add to *another* triplestore as I go.

So, I've turned this into a two pass parser that uses queries over a  
store.

And testing! Whoa, bnodes galore. To be safe, I might want to use a  
graph equivalence test. Does my toolkit have it? No? Off to read  
Jeremey's paper. Partioning. Ugh. Bleah. Wow, this hurts.

Let's loosen the requirements of the problem so I can use *other*  
features of teh Big Beefy Toolkit, say, the OWL API of that toolkit. Oh  
wait, it's not clear that Sesame (http://www.openrdf.org:80/) or RDF  
Gateway  
(http://www.intellidimension.com/default.rsp?topic=/pages/site/ 
products/rdfgateway.rsp) *have* such apis! So if I use them, I'm back  
to writing my own triples to OWL Abstract syntax (or rather, an api  
loosely based on that) parser. I switch to Jena or the OWL API...at  
least they have that built in! But that parser is still there and it  
has a couple of choices. It can take the Big Query approach (but then  
why'd we switch?!?!?), or it can parse a stream of triples.

But note that we have to wait for it to process the entire file before  
we can try to nnf *anything* (that last triple!) And it has to keep all  
the objects it constructs "pending" until that last triple. (Actually,  
I don't know if any existing system actually does that casue it's  
pretty painful. Sean punted, I believe. We punted. Peter?)

Ok, we pay that price (remember, even if the converter is implemented,  
we still have to run it!), and now we're ready to write nff...which  
will at this point, we hope, if we're lucky, be pretty similar to the  
term like one. Except, if we've parsed to, oh, java objects, we don't  
have a nice external representation. Sigh. We could generate the xml  
version and then use the sax transformer...but wouldn't that have been  
easier to have done from the start?

Anyone care to do the doublenegation by hand in the RDF case? It's  
pretty horrible. In the term case it's perfectly reasonable. So if I'm  
trying to *explain* this to a developer or student, it's not hard at  
all in the one case, death on toast in the other.

Let me point out that parsing from the term representation to the  
triple one is likely to be *waaaaaaaay* easier (certianly not worse  
than a general RDF parser). So, should there be a task for which the  
tripley representation is better, I can get that pretty easily! So why  
is the tripley one the *interchange syntax*?!

And I reiterate, this is before considering the various semantic  
difficulties. I just presume I'm treating RDF as plain data. I'm  
assuming that my query language isn't trying to be sound and complete  
with respect to the RDF or RDFS or (especially) the owl semantics.

For the kind of heroics you  have to go through on the relentless  
triple approach, you should have correspondingly heroic benefits. What  
do I get back in trading off simpilcity, modularity, reusability, time,  
space, code size, and probably a few other things?

Argh, it's 4am again :) Stupid midnight telecons!

This is not graceful degradation in a corner case. This is falling flat  
on your face in an easy and easily generalizable case.

So, I've harped on this case. I do not believe this case an anomaly.  
Just take OWL. The pain you see here was much worse there. And it  
really is needless pain! Is the semantic web so successful that it can  
easily afford the waste of time, energy, and confusion we see here?  
Consider just the PR problem of telling XML people that in order to do  
*anything* they need an entirely new set of beefy tools.

Note that, in a sense, RDF/XML syntax is only the start of the pain.  
Triples, binary assertions, are just not the right tool for many jobs.  
Indeed, often they are a very wrong tool, a non tool, an anti tool. The  
sooner the semantic web community grasps this  (as a whole) the better  
off we'll be.

Cheers,
Bijan Parsia.

P.S., Kudos, again, to Geoff for taking action.
Received on Thursday, 6 January 2005 19:06:59 UTC