Re: same-syntax extensions to RDF

>"Peter F. Patel-Schneider" <pfps@research.bell-labs.com> writes:
>>  My understanding of the problem was to write extra semantic conditions on
>>  RDF(S) interpretations, and thus generate an embedding/encoding of binary
>>  first-order logic into a semantic extension of RDF(S).  You can't extend
>>  the syntax at the same time, however, which means that you have to encode
>>  formulae as triples somehow, and these triples retain their RDF(S) meaning.
>>
>>  However, it doesn't seem to me that Sandro has the same understanding.
>
>No, that's exactly what I want, too.
>
>I would also like the whole exercise to be, in the end, simple enough
>that the community interested in rule languages and RDF can be
>comfortably with the results.

If you want simplicity, surely a 'natural' syntax for FOL (or OWL, 
for that matter) is simpler than any kind of RDF embedding? If you 
don't like recursion, stick to clausal form and use something like 
TPTP. FOL syntax is not hard. Its way simpler than, say, Javacode. 
What makes everything complicated is this insistence on having it be 
simultaneously syntactically embedded in RDF and at the same time a 
semantic extension of RDF.

>I keep hearing about projects using RDF
>encodings of Horn or FOL, and they turn out to have either unclear or
>inconsistent semantics.  If we're going to move to a standard here, it
>would help to clear up whether it is possible to do this kind of
>encoding properly.

WHY? Why not get the standard done first, and then argue later about 
how to do an RDF encoding? Or, even more radically, why bother with 
an RDF encoding? What's wrong with XML? Seems to me that OWL has 
pushed this RDF idea about as far as it can be pushed; maybe a bit 
further, in fact.  None of the other standardization efforts seem to 
feel the need to be encoded into RDF. SPARQL has its own syntax, 
completely different in style from RDF (it doesnt even encode RDF in 
RDF) . The Rule proposals all use a rule-style syntax. The RDF-in- 
HTML embedding guys don't use RDF/XML syntax. Why do you have this 
RDF syntax bee in your bonnet? Dont get me wrong. Im all for RDF, and 
I like using it for all kinds of things that it can be naturally used 
for. But it does have its limitations.

>I can't imagine, for instance, that the approach
>taken by SWRL 0.6 would ever be approved in a W3C Recommendation, for
>the reasons I outlined in the team comment [1].

The semantic web can make its own standards without the W3C 
imprimateur, if the W3C insists on being insane. But I think you are 
wrong, in fact, since the W3C's own WGs are going to be producing 
draft recs which don't satisfy these requirements.

>If Peter can convince me that it's not possible to write the desired
>semantic condition, I can help him convince others that a different
>path needs to be taken for rule languages.  If I can convince Peter it
>*is* possible

You won't be able to convince him that it is, for the very good 
reason that it isn't.

There is a very general result due to Montague. Here's one way to 
state it. Suppose you have a formal system which is capable of 
expressing very simple arithmetic (technically, Robinson arithmetic). 
FOL would do, for example, but weaker logics would also do. Pretty 
much any language in which you can express or use recursion will also 
do the job. Now assume that you have some way to encode expressions 
of this language into terms (names) in the language itself, ie a 
downward reflection principle. Exactly how you do this doesn't 
matter: just as long as every expression e has an encoding [e]. 
Reification would do this job for RDF, for example, or using quoted 
strings, or using XML literals enclosing a chunk of RDF/XML.  Now add 
a predicate P to this language which satisfies the condition:

   for any e,   P([e]) implies e

(so that you can get back from the encoding to 'use' the logic, as 
you described in an earlier email) and suppose that you can always, 
somehow, perform the 'encoding' inference

   from e, infer P([e])

(that second one doesnt have to be an axiom; it only requires that 
you can somehow get from e to  P([e]) by some inference path or 
other. )  Given this, then the system is self-contradictory; you can 
derive a paradox from it.  Put another way, any language (good enough 
to express basic arithmetic) can't contain an useable meta-theory of 
its own notion of truth, or of provability, or even of self-knowledge.

One way to get the second rule to hold, by the way, is to have the axioms

P([e]) and P([e implies f]) implies P([f])

ie if you can 'mirror' the modus ponens rule in the metalanguage, 
then you are dead. This is the form that Montague and Kaplan 
originally identified. It was originally thought of as being to do 
with self-knowledge, and was called the paradox of the knower.

This is a very basic result, and it is kind of robust. People have 
tried all ways to wriggle past it, but there aren't any. The 
derivation of it is kind of elementary (doesnt use advanced set 
theory, etc.) . There really arent going to be any magic bullets 
which will slay this dragon: looking for them is like trying to 
invent a perpetual-motion machine.

You can get past the barrier by weakening the notion of reflection, 
which breaks the second Montague condition. This is how the KIF 
metatheory wriggled past the paradoxes, by putting a syntactic tweak 
into the truth predicate so that it doesnt apply to some expressions. 
But this is a tricky operation to get right. It works in KIF partly 
because KIF does reification very transparently, using direct 
quotation, so the syntax of the coded expression can be accessed 
straightforwardly when stating the necessary constraint. This would 
break if, for example, the language used an encoding mechanism that 
allowed identity statements between encoded names (this is Peter's 
observation). So if you were to do this for RDF somehow, the trick 
might fail when you moved to OWL (because of owl:sameAs).

>(and we figure out how onerous it is), then a standard
>can use it (or not, but because the difficulty has been actually
>weighed).
>
>Unfortunately, I can't just hand Peter the desired semantic
>conditions.  I don't really know how to write them in the formalism of
>RDF Semantics [2].  My understanding is that Peter doesn't either and
>suspects it's not possible to write them.

The problem has got nothing to do with the form in which they are written.

>Meanwhile, it is "obvious" how to write them informally and with
>subtle errors.

It SEEMS obvious, but appearances are misleading. It seemed obvious 
to Frege, until Russell pointed out the problem. Even Russell was 
kind of embarrassed by how almost childish the paradox seemed - 
nobody took the liar paradox seriously at the time - and was 
convinced that there would be a neat, simple-to-understand way to 
bypass it. The result of that particular naive enthusiasm was 
ramified set theory (abandoned as too complicated to use around 1925) 
and Principia Mathematica.  But we know now, almost a century later, 
that this apparently simplicity really is an illusion. Don't be 
misled by how obvious it seems to be.

It really would be a pity if the entire W3C SW standardization effort 
were to be warped by this false sense of how "obvious" all this stuff 
is. This is like saying that Newtonian physics is kind of obvious, so 
transistors shouldn't be based on quantum theory. We KNOW that the 
problems of consistency in self-referential languages are a black 
hole. We KNOW that none of the techniques that some very clever 
people thought might be able to handle it - reification, 
substitutional quantification, 3-valued logics, constructive logics, 
etc. - will get you past the basic, foundational, snag. We know this 
is something very fundamental in the foundations of formal semantics. 
So why doesn't the W3C just give up on this insane crusade to reduce 
everything to RDF? That idea was a mistake. Get used to it. It is not 
usually a good idea to work hard to to try to base a standard on a 
howler, particularly when the territory has been so thoroughly 
investigated.

And particularly when there is in fact, it seems to me, no real 
problem here to be solved. It is trivial to embed RDF into full FOL, 
corresponding to the fact that RDF is semantically a special case of 
FOL. See the RDF-> SCL embedding described in
http://www.ihmc.us/users/phayes/SW2SCL.html for example. which is 
about as obvious as it can get.  Why wouldn't that be good enough? It 
has many advantages: for example, it can allow future extensions to 
the language very neatly, it means you can use Xpath to access 
sub-expressions, you can use DTDs to check basic syntax, and so on.  
Why do you want to be able to push all SWeb languages through an RDF 
pipeline? Its not as though it is hard to transmit, parse and use 
simple recursive languages, or that there are any real obstacles to 
doing it. People have been doing it for years. HTML with embedded 
Javascript does it.  And if you start doing things like embedding FOL 
expressions in RDF literals, then you have to violate the RDF 
conventions to get at the logic anyway; the RDF has just become a 
kind of outer crust that is getting in the damn way. Since you will 
have to have engines that go 'outside' the RDF conventions in order 
to extract the encoded FOL, why not just send the strings to them 
directly? You could put the FOL in a file, give it a URI and use RDF 
to say that it is a file full of FOL, using an RDF ontology of SWeb 
languages, if you really want to have everything triggered by RDF. I 
don't think there would be much operational difference between this 
and having the FOL encoded syntactically inside the RDF, in fact - as 
Peter points out, you still have to parse the FOL expressions 
somehow, however they are encoded - but it would enable us to put 
this silly issue to rest.

An analogy.  Suppose that someone were to say that a construction 
like <img 
src="http://i.a.cnn.net/cnn/.element/img/1.1/ceiling/gradient.line.gif" 
/> in HTML was inherently faulty because the actual image was not 
'inside' the HTML document, and so the HTML does not properly reflect 
the part/whole structure of the final viewed document, in which the 
image really does appear inside the document page; and this is a 
violation of a Basic Theory of Web Documents.  On this view.,  HTML 
needs to be re-designed so that all parts of a web page are 
incorporated into the same file as the actual HTML text, not as 
external resources referred to by URIs, but actually part of the same 
syntax, parseable using HTML (or XHTML) conventions and also be an 
HTML semantic extension in the sense that the way that the embedded 
image-entity was processed should be the same standard method of 
processing as HTML uses for everything else, including text.  There 
might be various schools of thought on how to do this. They would 
both encode the image bytes as text in some way, lets suppose (which 
was decided by a WG which met for 4 years and produced a huge report 
explaining the technique of writing the image file as a bitstring 
which is split into 16-bit chunks, interpreted as UTF-16 Unicode and 
then mapped back into ASCII)  but one school thinks that the text 
itself should be simply thrown away, and the image displayed using 
image-display conventions; whereas the members of the W3C Team insist 
that this amounts to a non-standard semantic treatment, and suggest 
instead that images should always be represented using 'Unicode-art' 
where each pixel is a Unicode glyph, so as to preserve the standard 
HTML meaning of all embedded text uniformly: images in HTML are then, 
strictly speaking, just text, but rendered in a very small font.  You 
can imagine an entire novella here, all horribly plausible: there 
might be special proposals by consortia of commercial foundries, 
proposing standardized  minifonts suitable for rendering CMYK images.

Now, this little fantasy seems to me almost exactly analogous to the 
debates over using RDF syntax for everything. The insistence on 
preserving the RDF standard semantics while at the same time 
embedding everything into its restrictive syntax is going to produce 
exactly the same kind of damn silly time-wasting discussions, and run 
the entire process into the same kind of pointless rathole, that 
'true embedded HTML' would have taken HTML.; and for much the same 
reasons.  The real issue is not whether or not it can possibly be 
done, but whether it is a good idea to even try. Clearly if it can be 
done, it will only be by virtue of an enormous amount of work, and 
the result will be ugly, hard to use and fragile; and we already know 
much better ways to do this, ways that are already in use, known to 
be robust and are widely understood.

In case you are thinking about CWM as an proof-of-concept, it isn't. 
The way that CWM encodes FOL is in direct and stark violation of the 
W3C's own recommendations. If the idea is to move forward along those 
lines, then please say so now so that we can drop this entire topic, 
resign from the W3C WGs and find more productive ways to spend our 
time.

>  Something like "put the FOL expressions in a string
>literal and assert it"

Or in a reified triple and assert that, or coded as a Goedel number 
and assert that. It makes no difference to the Montague result. The 
form of the encoding is irrelevant: you could use a binary encoding 
of an XML character string, rewritten in musical notation and then 
transcribed back through base-7 arithmetic into an XSD integer. You 
might as well try to program an uncomputable function by picking a 
different programming language. The only way to avoid paradox is to 
weaken the connection between language and metalanguage, eg by partly 
breaking the reflection inferences. But then you have to be able to 
prove that there isn't some hidden way of sneaking past your 
restriction.

>, as I explored in my previous e-mail [3].
>
>So I've been trying to write them using a formalism in which I have a
>good sense of how to do it and with which I think I can detect most if
>not all subtle errors.  It may also show the way in which to do it in
>the desired formalism.
>
>I'll try to return to the substance of this thread shortly.  Peter's
>raised to questions I need to explore.

There are lots more waiting for you when you get past those.

Pat

>
>        -- sandro
>
>[1] http://www.w3.org/Submission/2004/03/Comment (in the "RDF" section)
>[2] http://www.w3.org/TR/rdf-mt/
>[3] http://lists.w3.org/Archives/Public/www-rdf-logic/2004Dec/0022.html


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes

Received on Thursday, 30 December 2004 11:37:04 UTC