- From: Bijan Parsia <bparsia@isr.umd.edu>
- Date: Fri, 7 Jan 2005 04:06:57 +0900
- To: "Geoff Chappell" <geoff@sover.net>
- Cc: <www-rdf-logic@w3.org>
Bravo! Good for you for taking on the challenge! On Jan 6, 2005, at 12:07 PM, Geoff Chappell wrote: [snip] > Given the challenge, I had to give it a try in RDF Gateway's rule > language > ;-) The results don't really rebut the ugliness claim, but do > demonstrate > that it's doable in at least one of the available frameworks. FWIW, I would never claim and haven't that it *couldn't* be done. It has been done before, and it's pretty clear how to do (as you demonstrate). > BTW, I'm not > denying this was a bit of a pain, nor in any way trying to be an > advocate > for the forcing of fol into rdf syntax. Granted. > I ended up with rulebase below. With it I could convert a graph to nnf > form > with a few lines - e.g: > > var ds = new > datasource("inet?parsetype=auto&url=c:/kill/nnftest.rdf"); > > select ?p ?s ?o using #ds rulebase nnf where {[rdf:type] ?c > [owl:Class]} > and nnf(?c ?p ?s ?o); > > The rules may not be 100% though I tested them with a decent number of > cases > (but no pathological ones). Sorry about the missing bits of the specification. It *was* 4 am or so :) > rulebase nnf > { > infer nnf(?cin, ?p, ?s, ?o) from nnf_pos(?cin, ?cout, ?p, ?s, ?o); > > infer nnf_pos(?cin, ?cout, ?p, ?s, ?o) from isAnon(?cin) Of course, and we might consider this an extension to the task: the expression could be named! Even if not named, consider the semantics...you can't smush bnodes that identify equivalent expression or expressions of exactly the same form. Every time you see one of these, there might be other triples floating around :( Aliasing problems are much worse in RDF, I warrant. Notice how the RDF representation imports your expressions into your *modeling* domain. It's not merely that you can *introspect* your expressions, they are actually there in your domain! Kinda scary. (Sorry, that was a bit of an aside.) [snip as things are evolving a bit faster than I'm writing :)] Suppose we end up with an nnf fucntion in this style that's complete and correct. Let's consider some of the effects of some of the choices. First, for analysis of the program. The comparison class would be a program that uses a term like syntax (as I've been calling it). For example, <not><not>http://foo.com/A</not></not> Roughly the same as ~~A where A is an atomic name. So, in both cases we need an xml parser, say a sax one. In the term case, we can write the nnf directly on sax events (I'm pretty sure; don't see why not; maybe I'll do the exercise). So, our dependancies are done. We need nothing else. It'll also be pretty close to maximally efficient, depending on whether we use infix or pre/post fix, because we can avoid lookahead. It's also fairly trivial to add some simple syntactic validation along the way, by hand, if we'd like. A lot of the rdf legal but malformed constructions are just impossible in this syntax. The syntax is usefully compositional. I can embed such expressions in larger forms and my corresponding code for dealing with those expressions could be (likely) called from other code. Imagine an XSLT stylesheet for nnf. If I extend the syntax to handle subClassOf (i.e., limited conditionals), I should be able to transform the left and right hand sides using pretty much my old stylesheet. Writing test cases is easy as is checking them. If I want to separate syntax checking from my transform, I could whip up a W3C XML Schema or relax-ng schema. It's going to be fairly simple in this case. I can then use those schemas in a range of editors to assist me in generating correct documents. In the Using A Big RDF Toolkit Oriented Case, my dependancies are much worse. I don't just need an RDF parser, but, according to this line of thinking, I need a Big Beefy RDF Toolkit with Query. These kits aren't small, and they aren't as ubiquitous (I'd much rather write just a sax parser, than both a sax and an RDF parser...though it's not *that* terrible to write an RDF parser, it's just a waste in this case!) Ok, so I *have* to load the entire document into memory before I can do anything else! (The last triple might be relevant to the first triple!) (Or into a disk based database, but I trust the obvious worseness of that is sufficient to rebut it). I either have to have indexed it a lot, or I must touch lots of triples with *each* query. If I want to be careful, I probably want to delete them as I consume them, and let's say I'll serialize or add to *another* triplestore as I go. So, I've turned this into a two pass parser that uses queries over a store. And testing! Whoa, bnodes galore. To be safe, I might want to use a graph equivalence test. Does my toolkit have it? No? Off to read Jeremey's paper. Partioning. Ugh. Bleah. Wow, this hurts. Let's loosen the requirements of the problem so I can use *other* features of teh Big Beefy Toolkit, say, the OWL API of that toolkit. Oh wait, it's not clear that Sesame (http://www.openrdf.org:80/) or RDF Gateway (http://www.intellidimension.com/default.rsp?topic=/pages/site/ products/rdfgateway.rsp) *have* such apis! So if I use them, I'm back to writing my own triples to OWL Abstract syntax (or rather, an api loosely based on that) parser. I switch to Jena or the OWL API...at least they have that built in! But that parser is still there and it has a couple of choices. It can take the Big Query approach (but then why'd we switch?!?!?), or it can parse a stream of triples. But note that we have to wait for it to process the entire file before we can try to nnf *anything* (that last triple!) And it has to keep all the objects it constructs "pending" until that last triple. (Actually, I don't know if any existing system actually does that casue it's pretty painful. Sean punted, I believe. We punted. Peter?) Ok, we pay that price (remember, even if the converter is implemented, we still have to run it!), and now we're ready to write nff...which will at this point, we hope, if we're lucky, be pretty similar to the term like one. Except, if we've parsed to, oh, java objects, we don't have a nice external representation. Sigh. We could generate the xml version and then use the sax transformer...but wouldn't that have been easier to have done from the start? Anyone care to do the doublenegation by hand in the RDF case? It's pretty horrible. In the term case it's perfectly reasonable. So if I'm trying to *explain* this to a developer or student, it's not hard at all in the one case, death on toast in the other. Let me point out that parsing from the term representation to the triple one is likely to be *waaaaaaaay* easier (certianly not worse than a general RDF parser). So, should there be a task for which the tripley representation is better, I can get that pretty easily! So why is the tripley one the *interchange syntax*?! And I reiterate, this is before considering the various semantic difficulties. I just presume I'm treating RDF as plain data. I'm assuming that my query language isn't trying to be sound and complete with respect to the RDF or RDFS or (especially) the owl semantics. For the kind of heroics you have to go through on the relentless triple approach, you should have correspondingly heroic benefits. What do I get back in trading off simpilcity, modularity, reusability, time, space, code size, and probably a few other things? Argh, it's 4am again :) Stupid midnight telecons! This is not graceful degradation in a corner case. This is falling flat on your face in an easy and easily generalizable case. So, I've harped on this case. I do not believe this case an anomaly. Just take OWL. The pain you see here was much worse there. And it really is needless pain! Is the semantic web so successful that it can easily afford the waste of time, energy, and confusion we see here? Consider just the PR problem of telling XML people that in order to do *anything* they need an entirely new set of beefy tools. Note that, in a sense, RDF/XML syntax is only the start of the pain. Triples, binary assertions, are just not the right tool for many jobs. Indeed, often they are a very wrong tool, a non tool, an anti tool. The sooner the semantic web community grasps this (as a whole) the better off we'll be. Cheers, Bijan Parsia. P.S., Kudos, again, to Geoff for taking action.
Received on Thursday, 6 January 2005 19:06:59 UTC