Re: same-syntax extensions to RDF from Harry Halpin on 2005-01-01 (www-rdf-logic@w3.org from January 2005)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Sat, 1 Jan 2005 12:36:05 -0500 (EST)
To: Pat Hayes <phayes@ihmc.us>
Cc: Sandro Hawke <sandro@w3.org>, www-rdf-logic@w3.org
Message-ID: <Pine.LNX.4.61.0501010058440.28858@tribal.metalab.unc.edu>
 	Obviously since description logic is a fragment of first-order
logic the conversion of description logic to first-order logic is much
more solid than trying to embed first-order logic in a description logic
(or RDF). So from a logic standpoint, is there a good XML notation for 
first-order logic? Wasn't there a XML-CL "ISO Common Logic" proposal 
batted about? This is the latest link I can find: 
http://www.ihmc.us/users/phayes/CL/XCL_sentence_syntax.html.

I noted Peter says OWL Full is equivalent in power to FOL
(http://www.w3.org/2001/sw/meetings/tech-200303/all-pfps.htm), yet
given the flow of discussion on this list it's a bit unclear if that's
true (I can't tell from OWL Semantics), or if the OWL syntax 
is usable for FOL at all.

Anything newer or any alternatives? I'm somewhat shocked someone didn't
formulate XML-encoded FOL before SemWeb standards where even made given 
the prevalence of FOL in the world :)

Another question is there a principled translation between FOL "in the 
wild" to OWL DL regardless of syntax? This is of utmost importance to 
myself, as a  computational linguistics kinda guy I work on tools 
translating natural language to FOL (currently encoded in Discourse
Representation Theory, a notational variant of FOL), and would like
to make these tools available to the SemWeb world. Pat Hayes's work on 
translating from "domestic" OWL -> SCL seems the most extensive, although 
Bordiga's paper "On the Relative Expressiveness of Description Logics and 
Predicate Logics." seems to be another good source. I'd like to see at 
least a nice W3C note on FOL->OWL DL->FOL including the formal semantics 
- I'd even be willing to help and it would help my own work immensely! 
Now the translation to FOL->OWL DL is *really* tricky, since most of FOL
is not expressable in DL, so I've found it difficult to try to find
a reasonable algorithm to, given a set of FOL statements, find a subset
of that set that can be expressed in DL.

I've got a quick and dirty XSLT script for translating FOL(DRS notation) 
in the wild into OWL-DL using the small set of  rules but it uses my own 
idiosyncratic XML FOL syntax, so it has no use to the wider community, and
very ad-hoc rules.

This whole translation business would seem to have something to do with 
the Guarded Fragment, but I haven't found any good theoretical papers 
motivating the connections between DL (esp an OWL-flavored DL) and the 
Guarded Fragment. Any work out there?

Again, the motivation behind Sandro's comments is the *utterly real*: 
People use FOL in the real world for all sorts of thing. People associate 
the  Semantic Web mentally with logic, AI, "semantics", etc. - and lots of 
people use FOL to encode "semantics". Thus, it is actually only natural, 
if mistaken, to try to use whatever SemWeb standards are out there to 
encode FOL, given the preconceptions most people have about "semantics".
Pat Hayes's "Catching the Dreamcatcher" paper makes the case well - I
know this argument has happened on this list before, but maybe now that DL
is done with the SemWeb standards cadre can move onto FOL, esp. given
the good work of CL?

So, as someone interested in the SemWeb,  it *might* make the most sense 
if the W3C and the SemWeb community got together, rallied up a SemWeb 
standard for FOL ala RDF, and then showed how RDF/OWL DL was a proper subset of FOL. Then people could use
whatever standard they wanted and translate between the two. I'm pretty
sure FOL can't fit in the wonderful world of "triples", but vice versa 
should be true. Then people could use RDF/OWL when decidability/ 
tractability/"all the nice triple features" were needed, and  FOL when 
they use naturally use FOL or need it's expressive power. Ala Richard Stallman, 
the users should be given the  freedom to choose, and part of TimBL's 
vision has always been universal standards with differing levels of power
(although XML Schema FSA vs. RELAX NG tree automata debate obviously 
missed this point as well!).

Or this could just distract everyone from improving and using OWL/RDF, so 
I see the case against working on FOL directly. But Sandro's comments and 
my own problems demonstrate the need.

 				Cheers,
 					harry








On Thu, 30 Dec 2004, Pat Hayes wrote:

>> "Peter F. Patel-Schneider" <pfps@research.bell-labs.com> writes:
>>>  My understanding of the problem was to write extra semantic conditions on
>>>  RDF(S) interpretations, and thus generate an embedding/encoding of binary
>>>  first-order logic into a semantic extension of RDF(S).  You can't extend
>>>  the syntax at the same time, however, which means that you have to encode
>>>  formulae as triples somehow, and these triples retain their RDF(S) 
>>> meaning.
>>> 
>>>  However, it doesn't seem to me that Sandro has the same understanding.
>> 
>> No, that's exactly what I want, too.
>> 
>> I would also like the whole exercise to be, in the end, simple enough
>> that the community interested in rule languages and RDF can be
>> comfortably with the results.
>
> If you want simplicity, surely a 'natural' syntax for FOL (or OWL, for that 
> matter) is simpler than any kind of RDF embedding? If you don't like 
> recursion, stick to clausal form and use something like TPTP. FOL syntax is 
> not hard. Its way simpler than, say, Javacode. What makes everything 
> complicated is this insistence on having it be simultaneously syntactically 
> embedded in RDF and at the same time a semantic extension of RDF.
>
>> I keep hearing about projects using RDF
>> encodings of Horn or FOL, and they turn out to have either unclear or
>> inconsistent semantics.  If we're going to move to a standard here, it
>> would help to clear up whether it is possible to do this kind of
>> encoding properly.
>
> WHY? Why not get the standard done first, and then argue later about how to 
> do an RDF encoding? Or, even more radically, why bother with an RDF encoding? 
> What's wrong with XML? Seems to me that OWL has pushed this RDF idea about as 
> far as it can be pushed; maybe a bit further, in fact.  None of the other 
> standardization efforts seem to feel the need to be encoded into RDF. SPARQL 
> has its own syntax, completely different in style from RDF (it doesnt even 
> encode RDF in RDF) . The Rule proposals all use a rule-style syntax. The 
> RDF-in- HTML embedding guys don't use RDF/XML syntax. Why do you have this 
> RDF syntax bee in your bonnet? Dont get me wrong. Im all for RDF, and I like 
> using it for all kinds of things that it can be naturally used for. But it 
> does have its limitations.
>
>> I can't imagine, for instance, that the approach
>> taken by SWRL 0.6 would ever be approved in a W3C Recommendation, for
>> the reasons I outlined in the team comment [1].
>
> The semantic web can make its own standards without the W3C imprimateur, if 
> the W3C insists on being insane. But I think you are wrong, in fact, since 
> the W3C's own WGs are going to be producing draft recs which don't satisfy 
> these requirements.
>
>> If Peter can convince me that it's not possible to write the desired
>> semantic condition, I can help him convince others that a different
>> path needs to be taken for rule languages.  If I can convince Peter it
>> *is* possible
>
> You won't be able to convince him that it is, for the very good reason that 
> it isn't.
>
> There is a very general result due to Montague. Here's one way to state it. 
> Suppose you have a formal system which is capable of expressing very simple 
> arithmetic (technically, Robinson arithmetic). FOL would do, for example, but 
> weaker logics would also do. Pretty much any language in which you can 
> express or use recursion will also do the job. Now assume that you have some 
> way to encode expressions of this language into terms (names) in the language 
> itself, ie a downward reflection principle. Exactly how you do this doesn't 
> matter: just as long as every expression e has an encoding [e]. Reification 
> would do this job for RDF, for example, or using quoted strings, or using XML 
> literals enclosing a chunk of RDF/XML.  Now add a predicate P to this 
> language which satisfies the condition:
>
>  for any e,   P([e]) implies e
>
> (so that you can get back from the encoding to 'use' the logic, as you 
> described in an earlier email) and suppose that you can always, somehow, 
> perform the 'encoding' inference
>
>  from e, infer P([e])
>
> (that second one doesnt have to be an axiom; it only requires that you can 
> somehow get from e to  P([e]) by some inference path or other. )  Given this, 
> then the system is self-contradictory; you can derive a paradox from it.  Put 
> another way, any language (good enough to express basic arithmetic) can't 
> contain an useable meta-theory of its own notion of truth, or of provability, 
> or even of self-knowledge.
>
> One way to get the second rule to hold, by the way, is to have the axioms
>
> P([e]) and P([e implies f]) implies P([f])
>
> ie if you can 'mirror' the modus ponens rule in the metalanguage, then you 
> are dead. This is the form that Montague and Kaplan originally identified. It 
> was originally thought of as being to do with self-knowledge, and was called 
> the paradox of the knower.
>
> This is a very basic result, and it is kind of robust. People have tried all 
> ways to wriggle past it, but there aren't any. The derivation of it is kind 
> of elementary (doesnt use advanced set theory, etc.) . There really arent 
> going to be any magic bullets which will slay this dragon: looking for them 
> is like trying to invent a perpetual-motion machine.
>
> You can get past the barrier by weakening the notion of reflection, which 
> breaks the second Montague condition. This is how the KIF metatheory wriggled 
> past the paradoxes, by putting a syntactic tweak into the truth predicate so 
> that it doesnt apply to some expressions. But this is a tricky operation to 
> get right. It works in KIF partly because KIF does reification very 
> transparently, using direct quotation, so the syntax of the coded expression 
> can be accessed straightforwardly when stating the necessary constraint. This 
> would break if, for example, the language used an encoding mechanism that 
> allowed identity statements between encoded names (this is Peter's 
> observation). So if you were to do this for RDF somehow, the trick might fail 
> when you moved to OWL (because of owl:sameAs).
>
>> (and we figure out how onerous it is), then a standard
>> can use it (or not, but because the difficulty has been actually
>> weighed).
>> 
>> Unfortunately, I can't just hand Peter the desired semantic
>> conditions.  I don't really know how to write them in the formalism of
>> RDF Semantics [2].  My understanding is that Peter doesn't either and
>> suspects it's not possible to write them.
>
> The problem has got nothing to do with the form in which they are written.
>
>> Meanwhile, it is "obvious" how to write them informally and with
>> subtle errors.
>
> It SEEMS obvious, but appearances are misleading. It seemed obvious to Frege, 
> until Russell pointed out the problem. Even Russell was kind of embarrassed 
> by how almost childish the paradox seemed - nobody took the liar paradox 
> seriously at the time - and was convinced that there would be a neat, 
> simple-to-understand way to bypass it. The result of that particular naive 
> enthusiasm was ramified set theory (abandoned as too complicated to use 
> around 1925) and Principia Mathematica.  But we know now, almost a century 
> later, that this apparently simplicity really is an illusion. Don't be misled 
> by how obvious it seems to be.
>
> It really would be a pity if the entire W3C SW standardization effort were to 
> be warped by this false sense of how "obvious" all this stuff is. This is 
> like saying that Newtonian physics is kind of obvious, so transistors 
> shouldn't be based on quantum theory. We KNOW that the problems of 
> consistency in self-referential languages are a black hole. We KNOW that none 
> of the techniques that some very clever people thought might be able to 
> handle it - reification, substitutional quantification, 3-valued logics, 
> constructive logics, etc. - will get you past the basic, foundational, snag. 
> We know this is something very fundamental in the foundations of formal 
> semantics. So why doesn't the W3C just give up on this insane crusade to 
> reduce everything to RDF? That idea was a mistake. Get used to it. It is not 
> usually a good idea to work hard to to try to base a standard on a howler, 
> particularly when the territory has been so thoroughly investigated.
>
> And particularly when there is in fact, it seems to me, no real problem here 
> to be solved. It is trivial to embed RDF into full FOL, corresponding to the 
> fact that RDF is semantically a special case of FOL. See the RDF-> SCL 
> embedding described in
> http://www.ihmc.us/users/phayes/SW2SCL.html for example. which is about as 
> obvious as it can get.  Why wouldn't that be good enough? It has many 
> advantages: for example, it can allow future extensions to the language very 
> neatly, it means you can use Xpath to access sub-expressions, you can use 
> DTDs to check basic syntax, and so on.  Why do you want to be able to push 
> all SWeb languages through an RDF pipeline? Its not as though it is hard to 
> transmit, parse and use simple recursive languages, or that there are any 
> real obstacles to doing it. People have been doing it for years. HTML with 
> embedded Javascript does it.  And if you start doing things like embedding 
> FOL expressions in RDF literals, then you have to violate the RDF conventions 
> to get at the logic anyway; the RDF has just become a kind of outer crust 
> that is getting in the damn way. Since you will have to have engines that go 
> 'outside' the RDF conventions in order to extract the encoded FOL, why not 
> just send the strings to them directly? You could put the FOL in a file, give 
> it a URI and use RDF to say that it is a file full of FOL, using an RDF 
> ontology of SWeb languages, if you really want to have everything triggered 
> by RDF. I don't think there would be much operational difference between this 
> and having the FOL encoded syntactically inside the RDF, in fact - as Peter 
> points out, you still have to parse the FOL expressions somehow, however they 
> are encoded - but it would enable us to put this silly issue to rest.
>
> An analogy.  Suppose that someone were to say that a construction like <img 
> src="http://i.a.cnn.net/cnn/.element/img/1.1/ceiling/gradient.line.gif" /> in 
> HTML was inherently faulty because the actual image was not 'inside' the HTML 
> document, and so the HTML does not properly reflect the part/whole structure 
> of the final viewed document, in which the image really does appear inside 
> the document page; and this is a violation of a Basic Theory of Web 
> Documents.  On this view.,  HTML needs to be re-designed so that all parts of 
> a web page are incorporated into the same file as the actual HTML text, not 
> as external resources referred to by URIs, but actually part of the same 
> syntax, parseable using HTML (or XHTML) conventions and also be an HTML 
> semantic extension in the sense that the way that the embedded image-entity 
> was processed should be the same standard method of processing as HTML uses 
> for everything else, including text.  There might be various schools of 
> thought on how to do this. They would both encode the image bytes as text in 
> some way, lets suppose (which was decided by a WG which met for 4 years and 
> produced a huge report explaining the technique of writing the image file as 
> a bitstring which is split into 16-bit chunks, interpreted as UTF-16 Unicode 
> and then mapped back into ASCII)  but one school thinks that the text itself 
> should be simply thrown away, and the image displayed using image-display 
> conventions; whereas the members of the W3C Team insist that this amounts to 
> a non-standard semantic treatment, and suggest instead that images should 
> always be represented using 'Unicode-art' where each pixel is a Unicode 
> glyph, so as to preserve the standard HTML meaning of all embedded text 
> uniformly: images in HTML are then, strictly speaking, just text, but 
> rendered in a very small font.  You can imagine an entire novella here, all 
> horribly plausible: there might be special proposals by consortia of 
> commercial foundries, proposing standardized  minifonts suitable for 
> rendering CMYK images.
>
> Now, this little fantasy seems to me almost exactly analogous to the debates 
> over using RDF syntax for everything. The insistence on preserving the RDF 
> standard semantics while at the same time embedding everything into its 
> restrictive syntax is going to produce exactly the same kind of damn silly 
> time-wasting discussions, and run the entire process into the same kind of 
> pointless rathole, that 'true embedded HTML' would have taken HTML.; and for 
> much the same reasons.  The real issue is not whether or not it can possibly 
> be done, but whether it is a good idea to even try. Clearly if it can be 
> done, it will only be by virtue of an enormous amount of work, and the result 
> will be ugly, hard to use and fragile; and we already know much better ways 
> to do this, ways that are already in use, known to be robust and are widely 
> understood.
>
> In case you are thinking about CWM as an proof-of-concept, it isn't. The way 
> that CWM encodes FOL is in direct and stark violation of the W3C's own 
> recommendations. If the idea is to move forward along those lines, then 
> please say so now so that we can drop this entire topic, resign from the W3C 
> WGs and find more productive ways to spend our time.
>
>>  Something like "put the FOL expressions in a string
>> literal and assert it"
>
> Or in a reified triple and assert that, or coded as a Goedel number and 
> assert that. It makes no difference to the Montague result. The form of the 
> encoding is irrelevant: you could use a binary encoding of an XML character 
> string, rewritten in musical notation and then transcribed back through 
> base-7 arithmetic into an XSD integer. You might as well try to program an 
> uncomputable function by picking a different programming language. The only 
> way to avoid paradox is to weaken the connection between language and 
> metalanguage, eg by partly breaking the reflection inferences. But then you 
> have to be able to prove that there isn't some hidden way of sneaking past 
> your restriction.
>
>> , as I explored in my previous e-mail [3].
>> 
>> So I've been trying to write them using a formalism in which I have a
>> good sense of how to do it and with which I think I can detect most if
>> not all subtle errors.  It may also show the way in which to do it in
>> the desired formalism.
>> 
>> I'll try to return to the substance of this thread shortly.  Peter's
>> raised to questions I need to explore.
>
> There are lots more waiting for you when you get past those.
>
> Pat
>
>> 
>>        -- sandro
>> 
>> [1] http://www.w3.org/Submission/2004/03/Comment (in the "RDF" section)
>> [2] http://www.w3.org/TR/rdf-mt/
>> [3] http://lists.w3.org/Archives/Public/www-rdf-logic/2004Dec/0022.html
>
>
>


 	Harry Halpin
 	Informatics, University of Edinburgh
         http://www.ibiblio.org/hhalpin
Received on Saturday, 1 January 2005 17:36:49 UTC