Re: RDF semantics: applications, formalism and education

>
>[** Thinking...]
>
>Maybe we need to review and clarify the architectural framework.
>I can imagine something like this:
>
>Logic:           DAML+OIL, etc  (Full-strength inference, typing)

(niggle: DAML+OIL isn't full-strength logic.)

>Schema:          RDFS           (Limited inference, typing)
>Abstract syntax: RDF            (Directed labeled graph)
>                 XML Infoset    (Annotated tree)
>                 XML Namespace  <<--------------------- URIs
>Syntax:          XML            ("Pointy brackets")
>                 Characters     (Unicode, UCS, others)
>                 Octets         (Pretty universal now, not always so)
>                 Bits

I have some problems understanding what you are meaning here. The 
relations between the various layers don't seem similar, so this 
doesnt feel like a stack to me. Octets are a way to put characters 
into a data stream; characters form lexical items (a missing layer, 
by the way) for any language, not just XML. All languages have 
syntax: it doesnt stop with XML. Also the relation between XML and 
RDF is much murkier, since RDF more or less declares itself to be 
independent of XML (pointy brackets are just one possible option for 
RDF 'linear' syntax; real RDF structure resides in the triples. Which 
I think makes sense, since just about any language can be XML-ized. 
One can write an XML rendering of KIF, which puts real full-strength 
inference right on top of XML with no nonsense. )
So what I see here is a tree, which branches at each stage into many 
possibilities. Among those, RDF doesnt exactly stand out as a winning 
option.

>Different kinds of application can sit on different levels of this 
>"Stack".  All computer applications ultimately sit on 'bits', and 
>most sit on 'Octets'.  Unicode/UCS is becoming the norm for 
>applications using character-coded data (text, XML, and more).  E.g. 
>it's standard in Java.

Yes, though UCS-2 is about as much as any sane person could ever need.

>Today, there are many applications that sit directly on the XML 
>layer (+URIs):  many of the current W3C recommendations specify XML 
>applications.

I agree XML is a generally useful device. Labelled directed graphs 
are about as universal a format as anyone is likely to invent.

>XML Infoset (if I understand correctly) is an abstraction layer that 
>might allow XML to be based on non-character representations. 
>Applications based on this (using DOM?) should be isolated from the 
>underlying character syntax.
>
>I see the RDF abstract syntax as a simplification and generalization 
>of the underlying XML on which it is based.

I think just about everything in that sentence is wrong.
1. RDF isnt based on XML
2. RDF doesnt generalise XML
3. XML doesnt underlie RDF.
Arguably, I guess you could say that RDF is simpler than XML, in some 
sense that isn't very useful.

>Hopefully one which lends itself tolerably well to the construction 
>of higher semantic layers.  I think this abstract syntax is a 
>significant step towards being able to truly exchange information 
>between different applications (TimBL gives an example somewhere of 
>an invoice containing information about airplane parts:  financial 
>management data meets engineering design data).  I see current RDF 
>applications (RSS, CC/PP, etc as mainly operating at this layer).

The issue is not whether RDF could be used to exchange data. It 
obviously could. So could just about any other notation. The issue is 
whether RDF has any particular things going for it compared to all 
the other alternatives; and if so, what.

>The next layer, and probably the most difficult to judge correctly, 
>I have called the RDF schema layer, encompassing the basic ideas 
>that lead to primitive inferencing (rule following) and 
>typing/classification of resources.  I don't know if this is 
>possible, but I think it would be a useful goal if evaluations 
>defined at this level were guaranteed to be computable;  i.e. to 
>terminate in finite time.

That is exactly what the description-logic community have been 
studying in depth for the last decade. There are some results, some 
usable systems have been designed and implemented, and a lot of tough 
theoretical problems remain. Maybe y'all should do some reading; it 
will save a lot of time in the long run.

> I understand that this excludes full FOL.

It depends on what you mean by 'evaluations'. Checking a FOL proof 
for correctness is quite computable, even trivial. It can be done in 
linear time. Generating them takes a little longer.

>I think this could be a basis for a range of simple processing tasks 
>-- possibly a majority of those performed over the Web today.

I guess I was assuming we were trying to do better than that.

>Then there's the full logic layer, characterized by DAML+OIL.  This 
>would include the "universal web proof checking engine" that has 
>been proposed.

Checking proofs is all very well, but unless someone or something 
generates them, there won't be any proofs to check. And since 
checking is always much easier than generation, I would suggest we 
take complexity of proof generation to be the key computational issue 
in thinking about where we should go. Certainly that was central in 
designing DAML+OIL.

>This layer would support a range of what might be called knowledge 
>applications.

It can if it has a clear semantics. Without that, however, confusion 
will reign.

This is not just preaching, but the voice of experience, bitterly 
won. People have been through this territory before: database 
modellers, AI knowledge representers, computational logicians, 
natural language understanders, planners, and even psychologists and 
philosophers. One thing we all know a lot about is how NOT to do it.

>In setting out the above, I would hope to sketch a framework in 
>which we can usefully discuss what capabilities should be defined 
>where;  in particular, what are the appropriate levels of 
>functionality to be designed into the RDF and RDF schema layers?

Well, we certainly need some framework, to be sure.

Pat Hayes

---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes

Received on Saturday, 7 April 2001 03:31:12 UTC