Re: RDF as a syntax for OWL (was Re: same-syntax extensions to RDF) from Bijan Parsia on 2005-01-05 (www-rdf-logic@w3.org from January 2005)

From: Bijan Parsia <bparsia@isr.umd.edu>
Date: Thu, 6 Jan 2005 04:26:20 +0900
To: www-rdf-logic@w3.org
Message-Id: <AE043920-5F4F-11D9-97A5-000D93C1F7A6@isr.umd.edu>
First off, the simple response is for you take the nnf challenge.

It's not contrived, it's a key step in tableau reasoners. Most reasoner 
want some sort of conversion to a normal form. NNF is one of the easier 
ones.

Use whatever tools you like *except* a non triplestore, term like 
representation.

More replies in line.

On Jan 6, 2005, at 3:13 AM, Jeen Broekstra wrote:
> hi Bijan,
>
> Bijan Parsia wrote:
>
>> Jeen Broekstra <jeen@aduna.biz>
>>> Peter F. Patel-Schneider wrote:
[snip]
>> Clearly, I'm expecting the author in the second case to use a 
>> triplestore or to parse to triples first. Not too would be too 
>> horrible for screeds.
>
> Fair enough. Peter indicated in his answer that what I proposed was 
> more or less along the lines he actually implemented stuff, as well.
>
>> Also, if you read my posts, you'll see parsing is just the *simplest 
>> and easiest to discuss* example. It's no means exhaustive. It's by no 
>> means insurmountable. But I submit that if you make *parsing* and 
>> *syntax checking* hard, indeed, if you force people to give up ALL 
>> THE TOOLS THERE ARE and have to write a bunch of code or tool chains 
>> from scratch, all to accomplish what would be *trivial* otherwise, 
>> that you've shown that RDF data is *not* "nice to work with". 
>> Certainly not for these tasks.
>
> Then perhaps my problem with all this is that you seem

I humbly request that you stop with the seem claims. Either amass 
enough confidence from your reading of my text to make a defensible 
claim or ask me a question.

> to think that somehow users/ontology builders/application developers 
> should solve these problems

We have to solve them now. They have to be solved sometime. It's really 
painful to solve them with these constraints. And, if you avoid 
embedding, they are much easier to solve.

> (and I agree, for them it is "not nice"). My impression was very much 
> that for the kind of tasks your are talking about, users/ontology 
> builders/application developers expect *ready made tools*.

I've had ready made tools and good tool support. I don't believe you 
understand the task.

> The problem in my opinion is therefore not that the RDF model sucks 
> for these tasks,
> but that there are no tools available to do it for you (actually, for 
> at least some of the things you mentioned, these tools _are_ actually 
> available of course).

So for Pellet's species validator, we had jena with rdql support. For 
swiprolog I had prolog (which is easily as expressive). Bzzt. Try 
again. Try specifying the problem, even. Implement it both ways. *Try,* 
then report.

Look, if you already  have a species validator, you don't need 
(necessarily) to write one. If you use the OWL API structures, you 
don't have to think about triples. But the claim in question is whether 
triples are *nice* to work with. My counter claim is that my experience 
is that they are neither standalone nice, nor nice by comparison, for a 
wide range of common tasks.

Frankly, you have a completely unsupported opinion, from what I can 
tell. Why should I take your opinion seriously? Opine all you like, but 
your opining doesn't make it data.

>>> And I can't help but
>>> wonder if abandoning this approach for validation in favor of
>>> *querying* (or using rules, for all I care) for 'well-formedness'
>>> would make everything a bit easier.
>> No, actually, as Peter showed.
>
> I haven't actually seen that yet. But I'm willing to believe that it 
> is still quite difficult.
>
>> But put it another way, do you not refute yourself? You want to parse 
>> to a store then *query* that store (multiple times; with code 
>> inbetween) when you could have used a simple declarative grammar or 
>> transparent, easy to understand code?
>
> I do not see a contradiction here. Applying an RDF toolkit in this 
> fashion is not rocket science.

it's not a matter of rocket science. It's a matter of the tedium and 
errorpronenes of picking the absolutely wrong and boneheaded approach 
due to a bad choice of representation. In fact, it's wasting time and 
money that should be spent on the rocket science!

Again, you have no experience that I can see and it's not clear to me 
that you understand the scope of the problem. Try reading the various 
reports and discussions about species validation. (And you know? There 
is more than just species validation. It's just an example! But 
expressivity checking is a pretty useful thing and there are more 
distinctions than OWL Lite/DL/Full.)

> Taking the Sesame library as an example: creating a repository 
> in-memory and uploading the triples to it is about 5 lines of code. 
> The actual query is 1 line. Processing the result is of course 
> task-dependent, but not insurmountable.

Since we've surmounted, we already knew that. But compare that to a 
schema declaration. Plus, think of *all* the stuff you have to do (for 
all the species validation.) You are generalizing from my one, rather 
throw away, case.

> And I feel you're comparing apples and oranges. In using a declarative 
> grammar you are applying a ready-made tool for that specific purpose: 
> XML Schema validation only works if you apply a (duh...) XML Schema 
> Validator toolkit.

Yes. And there are lots of them. And I can convert, likely, to and from 
Relax-ng. And they are widely deployed and actually pretty well 
designed (compared to ad hoc code) for their task. They are integrated 
with edtiors (lots of them). Etc. etc. etc.

> Validating an OWL ontology will require using an OWL validator.

SIgh. But it needn't. Really. Not for syntax. For some other things 
(e.g., consistency) sure. But not for syntax. We could acutally reuse 
tools. And it would be *much* easier to extend.

> I'm not arguing that simply doing queries is enough; I'm arguing that 
> using such tools for implementing a validator that others can use to 
> actually validate is not unreasonable.

Then you argue irrelevantly. The claim from Sandro was that having 
triples is *nice* to work with. This suggests that code is pleasant and 
easy to write; that lightweight tools can be used; that specification 
of the tasks and their solutions flow naturally.

I deny all that. A species validator is a perfect *example of how its 
not nice* no matter how much of the canonical freaking toolkit you 
through at it. Take away schema validators, and dtds, a sane xml 
representation (a la dig or any number of formats) will still be way 
easier to code.

Of course species checking  of an owl ontology encoded as triples is 
way easy when you have a extant tool that species checks owl ontologies 
encode as triples! But that is, as I've said, a cheat. The proper focus 
of this discussion is on the implementation of a species validator (or 
a nnf normalizer). These are just examples, but real world ones. Just 
show me how RDF data wins for these.

>>> Perhaps I don't understand the parameters of the task at hand too
>>> well, but I also have the feeling that perhaps you are not applying
>>> the right tools to the job. Your quoted figures for an OWL parser 
>>> from
>>> RDF/XML seem to assume a DOM structure
>> ? I would love to see a quote justifying this seeming. I think you 
>> see straw where there is, in fact, iron.
>
> I'm not following what you mean by this. My remark was aimed at 
> Peter's email in which he presented an OWL parser.

Sorry, I correct this just before receiving this email.

> The RDF/XML version of that used a DOM structure, if I understood 
> correctly.

Yes but parsed that to triples, *then* parsed the triples to OWL 
abstract syntax

Don't you find that intermediate step...unfortunate? I mean, you like 
having three passes?

>>> but no RDF toolkit, i.e. you
>>> are directly trying to construct OWL from the XML syntax (you mention
>>> a 'nice internal data structure' but a graph data structure is not 
>>> the
>>> same as an RDF toolkit).
>> I was not able to find this quote.
>
> From Peter's message: "In summary, taking an RDF graph (totally parsed 
> and in a nice internal data structure) [...]". Last paragraph.

Ok, but you still misread him. But fine, feel free to write a species 
validator using queries (but check what Peter said about the task; 
there are good reasons  not to use a query language like that!) (I 
mean, the algorithmic complexity alone).

>> You're telling me that I should have an RDF toolkit, including a 
>> query engine, to *parse and validate* what is, in the end, a fairly 
>> trivial syntax?
>> I'm sorry, that just sounds *insane*. How is this making life 
>> *easier*.
>
> It makes it easier in that this gives you direct control over the 
> actual graph structure.

Dodge. What's at issue is whether we should *have* an actual graph 
structure for owl ontologies.

>  I do not see why this should be insane.

Then you should read again and ponder. (You beg the question, btw.)

If my goal is a term like structure, why does it help to go through a 
triple store instead of a syntax that's way easier to parse? Do you 
*deny* that it's easier to parse?

> Of course, the alternative is that you use an alternative 
> representation of OWL and manipulate that (with whichever tools are 
> good at manipulating that particular representation).

That would be a rebuttal of the RDF is nice point.

>  Fine with me. But if you do this at the triple level, it does not 
> seem unreasonable to me to use an RDF framework,

Which I've never denied. I'm saying that there is no good enough 
framework to make these tasks good or nice.

> which gives you all sorts of nice utilities and query languages to 
> manipulate the triple set with.

I've used such and written such. I don't know why you think I'm 
unaware. But please explain why we shouldn't write parsers using a 
relational db with all the nice facilities they supply?

>>> My hunch is that actually _using_ the triples through an RDF 
>>> API/query
>>> language instead of trying to bypass it will make life easier (and 
>>> no,
>>> I'm not claiming that it is trivial or very easy, I merely have the
>>> impression that it is not as fiendishly difficult as you make it out
>>> to be).
>> I'm sorry, you're wrong.
>
> That is always possible, of course. Though I believe that you 
> overestimate the number of use cases for which this holds.

I'm sure you do. But why should I give credence to your belief?

Plus, we were talking about, originally, the virtue of embedding yet 
more syntax and more semantics in the frail reed of RDF. I'm not saying 
RDF sucks for everything. I'm saying RDF for syntax for logics (a la 
OWL) sucks hard. So the relevant space is tools for manipulating, 
reasoning with, validation, etc. those logics.

>> It's not impossible, of course. It's just much nastier than the 
>> alternative.
>
> Fine. Then use the alternative. Noone is _forcing_ you to use triples; 
> there are other representations for OWL, and they are all 
> interchangable.

But surely they are. I want to play the semweb game. I want to help the 
semweb. I want to develop tools and applications. Why do you insist on 
making dealing with the normative bits of the semweb stack disgusting 
and silly wastes of time? (This is all *ASIDE FROM THE FACT* that the 
editors of the semantics of owl and rdf, the people with the most front 
line experience, think that the further extention of this style is 
likely impossible.)

>> My first attempt was using SWI Prolog and DCGs. It cannot be sanely 
>> done in normal DCG style, as far as I can see. You have to maintain 
>> tons of state. You are tempted to plop queries in curly braces and 
>> then you realize you've completely subverted the formalism! 
>> COMPLETELY! And then you are afraid all your prolog friends will 
>> think you touched in the head.

Wow, way to miss my explicit discussion of using a *fine* toolkit.

>>> To take Bijan's example of checking that a class expression such as:
>>>
>>>      <owl:Restriction>
>>>          <owl:onProperty rdf:resource="P"/>
>>>          <owl:someValuesFrom rdf:resource="C"/>
>>>      </owl:restriction>
>>>
>>> is 'well-formed', i.e. is exactly formulated as such and has no extra
>>> or missing triples, is simply a matter of doing some queries.
>>>
>>> construct distinct *
>> Construct constructs.

This was otiose.

>> So this is a cheat. Peter also pointed this out.
>
> I don't understand why this would be a cheat. It is a valid RDF query 
> that can be used in an RDF toolkit. Why is this a cheat?

 From 
http://lists.w3.org/Archives/Public/www-rdf-logic/2005Jan/0013.html:

"Umm, I don't believe that this would provide any benefit at all.  
First,
this would not retrieve any OWL DL-illegal triples, and thus would not 
be
useful for species validation.  Second, you are still left with a graph
that would have to be processed in exactly the same way as the entire 
RDF
graph would have to be.  For example, how would this make checking for
missing bits any easier to program?  (Yes, because the graph might be
significantly smaller it might be faster to check, but I don't think 
that
it would be any easier to check.)"

Check the "for example".

Using queries can help focus, but you still have to do all the work. No 
you just have tons of queries scatter in there. (this is opposed to 
walking a datastructure! i.e., an abstract syntax tree!)

I'd be very worried about this in a production system. Consider the 
memory costs of having to have a triple store (no streaming species 
checking parser!). Plus, we're missing the fact that it's much easier 
to miss your species when you have all those triples in that funky 
syntax. A term oriented syntax is *much* easier to fix, *much* easier 
to know (locally and immediately) that you got it wrong, etc. etc.

>> Plus, I've never seen construct distinct before. It's hardly 
>> widespread.
> > I seriously doubt that there is a production system available using 
> it
> > that's been remotely narrowly, much less widely, deployed.
>
> Oh I don't know.
>
> Or actually I do know. Sesame implements it. Has done so for about a 
> year now (the CONSTRUCT clause was originally introduced in the SeRQL 
> query language back then).

Construct distinct? Ok. One implemention.

>  I haven't counted the users who use this particular construction of 
> course, but I've done a fair number of projects myself in which this 
> is applied (mainly for graph transformations and such).
>
> That aside, the notion of CONSTRUCT DISTINCT in this context is hardly 
> the point, it was just an example.

Oh, that's a bit rich given your treatment of my just examples. Only my 
just examples connect to well specified problems (e.g., species 
validation) that you can go research to get a sense of what they 
entail.

>>> from {R} rdf:type {owl:Restriction};
>>>           owl:onProperty {Prop};
>>>           owl:someValuesFrom {Val}
>>>
>>> retrieves a subgraph that you can check (using any RDF toolkit's
>>> utility methods) quite easily for the existence/omission of triples.
>> This doesn't do the job. And if it did, it would still brutally suck 
>> next to a schema.
>
> You seem to forget that even a schema has to be implemented before it 
> works.

Clearly not. And I won't reply to another message from you about what I 
seem. You are, at best sir, uncharitable in your reading.

>> For example, what about error reporting? How about plugging into to 
>> an editor and enforcing correctness or autocompletion? You have to 
>> build, likely yourself, an entire infrastructure that doesn't work 
>> with anything else. Why?
>
> This is nonsense.

If you mean to apply this to what follows, I agree.

> If such tools become more freely available then _that_ is what you 
> use. For XML validation no-one implements his own XML Schema 
> interpreter right? Of course not, there are tools for that. Same for 
> OWL. How are triples an issue here?

Since the whole debate is about whether triples are a good data 
structure for, e.g., syntax manipulation, it is *THE* issue.

>> Remember, we're not talking about the possible, we're talking about 
>> the pleasant.
>
> If your point is that reinventing this tool wheel every time is a 
> pain, then yes, of course, I agree.

It's also about maintence and update. Consider the effort that went 
into specifying the rdf syntax for owl. Consider that we could have 
*normally specified it with a schema*, if we were allow to use a sane 
syntax. Wow, the *specification* could have eliminated a whole lotta 
implemenation.

How do we do that with an rdf toolkit?

>>> Granted, many query languages in the current set of RDF tools perhaps
>>> still miss the expressiveness to make this as painless as it might be
>>> (I'm thinking of explicit support for containers and collections,
>>> here, which many tools still miss, and aggregation functions such as
>>> count(), min(), etc.), but I still have the feeling this would be a
>>> good approach.
>> I respectfully submit that your feeling is totally wrong. Please, 
>> just examine some of the code. It's *all* open source. It's all 
>> *easy* to find. Why on earth are you speculating like this?
>
> I'm speculating because I do not have the time to look at this code.

So, you put your speculation against actual experience. Interesting.

>  I do know a little bit about OWL however, and I do know quite a bit 
> about RDF frameworks and query languages, and what they can (and 
> cannot) do.

Take the NNF challenge then. It should be twenty minutes.

>>> If you have experience to the contrary, it would be interesting to
>>> learn at what point you found the RDF toolkit/API/query language that
>>> you worked with lacking.
>> It's the wrong wrong wrong tool for the job. I wouldn't use a 
>> relational database to parse C. Would you? *Why*? Why would you *even 
>> consider it*?
>
> Apples and oranges.

Bah.

> I'm sorry but this metaphor just does not apply. Your complaint is 
> that you can't manipulate OWL
> ontologies through RDF triples because it is a pain.

No. I certainly can. It's just a pain.

>  submitted that perhaps if you made better use of the capabilities of 
> RDF frameworks it would be less of a pain.

And you're wrong. And the burden of proof is on you.

> Regardless of whether that particular assertion turns out to be true 
> or not, the link between triples and RDF frameworks should be obvious.

It's the link between triples and syntax for a logic that are at issue. 
You just don't grasp that.

> I see no obvious link between the relational data model and C parsing.

Sure it is. Let's encode C programs in tables. It shoudln't be hard. 
It's certianly modalable. Now, let's write a parser (that provides 
error messages) using Oracle or MySQL. They are the rich toolkit and 
they don't get much richer. Would that we had as rich for RDF.

>> (There is the introspector project, but it has slightly different 
>> aims, and I still think it's misguided. See: 
>> http://introspector.sourceforge.net/)
>
> Thanks, I'll have a look at that.
>
>> And remember, parsing is only the start! What sandro wants is 
>> impossible! What we got with owl was *super hard* (where there is an 
>> much simpler alternative).
>
> Ah. I jumped into the middle of this discussion, and have not actually 
> considered Sandro's proposals (I was merely triggered by some of what 
> you and Peter were saying). Sorry if that has lead to a topic drift.

Well, great. So that's why you don't know what we're talking about. I 
don't appreciate being accused of all sorts of stupid things because 
you didn't bother to find out what the debate was about.

>> Plus, remember, your team doesn't *get to use* bigger structures!   
>> So you
>> *can't* parse to some nice internal, OWL like structure (see owl api, 
>> KRSS, logic toolkits), and then do your manipulations!
>
> Why on earth not? That's what these tools are for! What exactly are 
> you trying to prove then with this excercise.

Read. The. Thread.

>> So my negation normal form challenge stands. You must read from a 
>> triplestore and write to a triplestore. You must handle aribtrary OWL 
>> Class expressions. For the record, this is typically no more than a 
>> few dozen lines of code. But I predict that it will be *nasty*.
>> For those who don't know what negation normal form is, well, first, I 
>> believe you've demonstrated sufficient lack of experience building 
>> semantic web tools that you start off in a bit of a hole
>
> Umm... Right.

Yes right. This is exactly about whether triples (acutally, not just 
triples qua data, but rdf *ASSERTIONS*) should be used as the canonical 
syntax for expresive logics like OWL in a certian style ("same syntax", 
i.e., "ecoded"). Your sarcasm would have more weight if you knew what 
we were talking about.

>> , and second, it's very simple.   Remember that OWL (Lite too! Just 
>> we decided to
>> torment generations by eliminating the explicit constructors!) has 
>> negation. So lets take a very simple transformation, double negation:
>>     (not (not C) <=> C
>> With negation normal form, you drive *ALL* the negations as "deep" 
>> into the formula (note how this metaphor loses its force with a 
>> triple approach :() as they can go, so that the only negations appear 
>> on class names. So, things like
>>     (not (and C D))
>> become
>>     (or (not C) (not D))
>> And so forth.
>> Hey, you don't have to write the function! Just explain as much as I 
>> explained using triples alone, whatever your favorite syntax.
>> (And remember! you don't get to consider expressions in isolation 
>> like that! After all, there could be a link!)
>> I hope this clarified things for you.
>
> Not quite. Your excercise seems contrived to me;

It's essential to any tableau reasoner. Tableau reasoners are one of 
the things you want to write for owl. RDF crap makes it harder. Much 
harder than it needs to be.

Plus, syntactic transformations of expressions is useful and general. 
This is just one *very* easy sort.

> I'm not quite sure what you are out to prove. NNF converters are 
> possible, but why would you want to do this at the triple level, and 
> without using any of the tools that are _designed_ to work on top of 
> that?

Use the tools. But hte tools are triple manipulators, not i.e., 
representations of the abstract syntax.

As to why I wouldn't want to use such an api, I *do* want to use them. 
I want to drop the pointless middle layer.

Cheers,
Bijan Parsia.
Received on Wednesday, 5 January 2005 19:26:18 UTC