Re: review of July 15 draft of RDF Semantics document from Peter F. Patel-Schneider on 2003-07-24 (www-rdf-comments@w3.org from July to September 2003)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Thu, 24 Jul 2003 07:00:36 -0400 (EDT)
To: phayes@ihmc.us
Cc: www-rdf-comments@w3.org, bwm@hplb.hpl.hp.com
Message-Id: <20030724.070036.98580650.pfps@research.bell-labs.com>
From: pat hayes <phayes@ihmc.us>
Subject: Re: review of July 15 draft of RDF Semantics document
Date: Thu, 24 Jul 2003 01:13:04 -0500

> Peter, greetings and thanks for the close reading.
> 
> All the changes mentioned below are now in the version dated 23 July at
> 
> http://www.ihmc.us/users/phayes/RDF_Semant_Edit_Weak.html
> 
> Pat
> ---------
> 
> >As I had received information from the RDF Core WG that the RDF Semantics
> >document was suitable for review, and I needed to see if my many concerns
> >with the RDF model theory have been resolved, I did a pass through the July
> >15 draft of RDF Semantics.
> >
> >Unforunately, I found quite a number of problems with this draft.  Some of
> >these are problems remaining from previous versions of the document but
> >some of them appear to have been newly introduced.
> >
> >
> >Drastic Problem:
> >
> >The treatment of XML Literals is inconsistent within the document and with
> >respect to RDF Concepts (at least the version of RDF Concepts that is
> >accessible through the pointer in the RDF Semantics document, there are
> >also broken links related to XML Literals).  The change list in RDF
> >Semantics
> 
> The change list is not part of the document. Please review the document.

I strongly differ.  I was told by Brian McBride, in various messages to me
also sent to www-rdf-comments@w3.org, that 
http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-mt-20030117/
is suitable for review, so I did review it.  This document includes a
section headed ``Change List since Last Call''.  How can this prominent
section not be part of the document?

> >says that XML literals ``are now required to be in canonical form
> >and therefore to denote their own literal string.''  This appears to mean
> >that XML literals are just a subset of character strings.  This is
> >completely counter to what is said in RDF Concepts.  Section 3 of RDF
> >Semantics has no mention of the fact that XML literals denote themselves.
> >It also says that is ``is deliberately agnostic as to whether or not XML
> >data is considered to be identical to a character string'', which is in
> >direct contradiction to the wording in the change list.
> >
> >XML Literals have been a source of very many problems.  As they are still
> >not correct, it would be much better to just dump them entirely.
> 
> That is not an option, so I will ignore this as a comment about the 
> document. XML literals are fully defined in the document.
> 
> >
> >Drastic Problem:
> >
> >There has been a significant conceptual change to simple interpretations.
> >IP is not required to be a subset of IR.  This does not appear to be in
> >response to any comment to the RDF Core Working Group nor to be in response
> >to any problem with the RDF model theory.  This change may have
> >consequences for other formalisms, including OWL, but no announcement about
> >it has been made.
> >
> 
> I would not describe this as a significant conceptual change, so much 
> a small technical improvement to the mathematical machinery. It was 
> mentioned in an informative email which you received and replied to. 

I take it that you are referring to this exchange:

** Subject: Re: possible semantic tweak
** From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
** To: phayes@ai.uwf.edu
** Cc: horrocks@cs.man.ac.uk
** Date: Wed, 04 Jun 2003 06:52:36 -0400 (EDT)
** X-Mailer: Mew version 2.2 on Emacs 21.1 / Mule 5.0 (SAKAKI)
** 
** From: pat hayes <phayes@ai.uwf.edu>
** Subject: possible semantic tweak
** Date: Mon, 2 Jun 2003 19:02:38 -0500
** 
** > Guys, some (very) recent work on the SCL project has suggested a 
** > possible semantic modification which may be relevant to the RDF/OWL 
** > interactions. I realize this is very late in the day, but I thought 
** > it might be worth bringing it up to see if it might be helpful.
** > 
** > The idea is due to Chris Menzel, and it makes SCL 'more like' 
** > conventional FOL, in fact restoring the relationship I once thought 
** > we had achieved but Ian's one-node-universe example (now christened 
** > in the SCL community a 'Horrocks sentence') showed we had not. The 
** > technical change is that an SCL interpretation allows (but does not 
** > require) that relations be in the universe of discourse; this means 
** > that the FOL subset of SCL is now exactly isomorphic to a 
** > conventional semantics for FOL, so all satisfiability properties and 
** > entailment relations between FOL sentences are identical in FOL and 
** > SCL: there are no Horrocks sentences. The cost to SCL is that some 
** > formerly inconsistent sets are now consistent (eg the Horrocks 
** > sentences), in particular
** > 
** > P(a) and not (exists (?r) ?r(a) )
** > 
** > is consistent, but if you add, say
** > R(P)
** > then the result is inconsistent, because this forces one to only 
** > consider interpretations in which P denotes something in the universe 
** > of discourse and therefore the universal applies to it.
** > 
** > We (the SCL group) are still debating whether or not to make this 
** > change, butI think we will.  But the point of this message is that I 
** > wondered what kinds of change to the RDF MT this would correspond to, 
** > and the answer is a very small one but which may be significant: to 
** > *not* require that IP be a subset of IR.  To be exact, the idea would 
** > be that any URI in a 'property' position must denote something in IP, 
** > and that any URI in a subject or object position must denote 
** > something in IR. (All the rest is unchanged.) So if any URI occurs in 
** > both positions, then of course whatever it denotes must be in IP and 
** > in IR, so IP and IR must overlap to that extent; but the extent to 
** > which they must is governed by the syntactic form of the graph being 
** > interpreted, rather than as now being fixed by fiat. This means that 
** > a satisfiable graph in which some URI occurs only in a property 
** > position can be satisfied by an interpretation in which the denoted 
** > property is excluded from the domain of discourse, in a kosher FOL 
** > style; and in fact a vocabulary-separated DL-style RDF graph can be 
** > interpreted in a thoroughly FO style in which individuals names 
** > denote individuals, property names denote binary relations and class 
** > names denote sets, the three being disjoint, and this still would be 
** > a legal RDF interpretation, and it could be a proper part of a legal 
** > RDFS interpretation with a segregated vocabulary.
** > 
** > My question to you is, SUPPOSE that I were to tweak the RDF MT so as 
** > to allow this; would that possibly simplify the OWL-DL correspondence 
** > theorem? Because as far as I can see it would make no difference at 
** > all to the RDFS documents: the difference between this and the 
** > current MT is invisible in RDFS, as far as I can tell. So if you 
** > think this would be worth doing, let me know (off-list) and I will do 
** > it editorially.
** > 
** > Pat
** 
** I don't see that it would make the correspondence theorem significantly
** simpler.  There already is a similar feature in the RDFS model
** theory for OWL, and it was very easy to set up.  
** 
** Further, I am very skeptical that you would get any mileage out of this
** trick in RDF.  Remember that most elements of the RDF and RDFS vocabulary
** are subjects or objects of some triple in RDFS models (either rdfs:domain,
** rdfs:range, rdfs:subClassOf, or rdfs:subPropertyOf) and thus would have to
** belong to IR.  
** 
** peter

I don't view this exchange as an announcement or even an indication that
the change will be made nor do I view my response as either an endorsement
of the change or an indication that the change would not cause any problems
for OWL.

> It was not made capriciously; it reflects a recent observation that 
> this slight weakening of the basic (not RDF) graph model theory makes 
> 'layering' of the sort requested by Jeff Pan and others somewhat 
> easier to achieve, since the basic model theory now allows a 
> conventional first-order structure of an interpretation of a graph 
> which satisfies the conventional syntactic layering: that is, if a 
> URIref occurs in a graph only in predicate position, it is no longer 
> required to denote something in the universe of quantification.  This 
> allows the basic model theory to be more conventional, since it no 
> longer requires the use of non-well-founded structures in all cases. 
> The credit for this idea is due to Chris Menzel, and it arose as 
> consequence of the SCL project working to eliminate the 'Horrocks 
> sentences' which had different satisfiability conditions in SCL and 
> FOL; this is of course closely related to the RDF/OWL layering 
> issues. Using a similar device, SCL has now achieved full FOL 
> compatibility.
> 
> This does not change any RDF or RDFS entailments or semantic 
> conditions, since these require that IP and IR overlap on the parts 
> of the RDF and RDFS vocabularies to which semantic conditions apply, 
> as the text notes; and since it weakens rather than strengthens the 
> conditions on simple interpretations, I do not believe that it will 
> have any significant effects on OWL.  Other members of the Webont 
> working group had reacted favorably to this change.  If you feel that 
> there are any problems arising from this change, please say what they 
> are.

The point is that I don't see that there are no problems resulting from
this change.  This determination could require considerable effort.

> >Problem:
> >
> >The definition of a proper instance admits a switch of blank nodes in the
> >graph, e.g., replacing _:a with _:b and vice versa, as a proper instance,
> >but this shouldn't be a proper instance.
> 
> It isn't a proper instance according the definition given:
> 
> "A proper instance of a graph is an instance in which a blank node is 
> mapped to a name or to some other blank node in the graph, so that in 
> the instance a blank node has been replaced by a name or two blank 
> nodes in the graph have been identified. "
> 
> On re-reading this I see that the comma may be misleading, and have deleted it.

Removing the comma changes the meaning of a proper instance.

> >This invalidates the anonymity lemma, as
> >	_:a <ex:p> _:b .
> >is a proper instance of itself and lean, so should not entail itself.
> >
> >
> >Problem:
> >
> >The example of a lean graph is not lean, as the instance of this graph
> >obtained by replacing _:x with <ex:a> is a proper instance of the graph.
> 
> It is lean according to the definition given, which refers to 
> instances being proper subgraphs.

Argh.  You are correct.  I was thinking that 
	<ex:a> <ex:p> <ex:a> .
was a proper subgraph of
	<ex:a> <ex:p> _:x .
	_:x <ex:p> _:x .
which it is not.

[...]

> >Problem:
> >
> >The definition of the merge of a set of graphs is inadequate.  Just which
> >blank nodes of members of S are to be replaced?
> 
> given the convention described in 0.2, it doesn't matter.
> 
> >  From the definition, the
> >merge of
> >	_:a <ex:p> _:b .
> >and
> >	_:a <ex:p> _:c .
> >and
> >	_:b <ex:p> _:c .
> >could be
> >	_:a <ex:p> _:b .
> >	_:a <ex:p> _:c .
> >	_:e <ex:p> _:e .
> >as this ``replaces blank nodes in some members of S by distinct blank
> >nodes''.
> 
> The definition reads, in full:
> "a set obtained by replacing blank nodes in some members of S by 
> distinct blank nodes to obtain another set S' of graphs which are 
> equivalent to those in S in the above sense. ",

Which is satisfied in the example above.

> To be quite sure of the meaning, I have added a phrase:
> 
> ". a set obtained by replacing blank nodes in some members of S by 
> distinct blank nodes to obtain another set S' of graphs which share 
> no blank nodes and are equivalent to those in S in the above sense."

This should work.

> >There are other problems in the definition of the merge as well.
> 
> I am unable to respond to that.

The definition needs to be rewritten.  I have had to read the defining
sentence numerous times to figure out just what is going on, often with
different results, and I'm still not sure that there isn't some problem in
the definition.

> >Problem:
> >
> >In Section 1.3 a vocabulary is defined as a ``set of URIrefs''.
> 
> It is not defined there; the text refers to such a set as being a 
> vocabulary, which is correct. However it could be better worded: I 
> have changed this to "set of names".

OK, Section 1.3 only defines the notion of a vocabulary of an
interpretation.  However, 
	All interpretations will be relative to a set of URIrefs, called
	the vocabulary of the interpretation, ...
defines the vocabulary of an interpretation to be a set of URIrefs, not a
set of names.

> >However, in the change log and in Section 0.3, a vocabulary is supposed to
> >be able to contain typed literals.
> 
> A set of URIs without typed literals is a vocabulary, however.

Agreed, but I don't see the point you are trying to make here.

> >Problem:
> >
> >There is no definition of a ``literal character string'' or a ``language
> >tag'', used in the definition of simple interpretations.
> 
>   "literal character string"  changed to  "character string".
> 
> Language tag is used in the sense of RFC3066. I have inserted a 
> reference link to the concepts document
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Graph-Literal
> which should clarify the intended meaning.

Is this consistent with RDF Concepts?  I should think that you should
instead defer to this other document.

[...]

> >Problem:
> >
> >The conditions for denotations should be augmented with more conditions
> >like ``if I(p) is in IP''.    I suggest adding as well ``if s, p, and o are
> >in V''.
> 
> Why do you feel this is necessary? This wording has not changed in 
> many versions of the document.

Well, you have the one condition, why include if you don't include the others?

> But since you insist, I have added the condition explicitly.



> >Problem:
> >
> >The example in Section 1.4 is incomplete in that it does not define LV.
> 
> True; it is only an example. LV can be any suitable set.

The example should say this.

> >Also, IL is necessarily the empty map as there are no typed literals in the
> >vocabulary of the example.
> 
> Ah, point taken. I have added "plus all typed literals with one of 
> these as the type URI"
> 
> >  This makes the fourth triple false, not true.
> >
> >The ``oddity'' of having a typed literal denote a non literal is not ruled
> >out in datatyped interpretations.
> 
> That isn't what was meant by 'oddity', but I have deleted this comment.
> 
> >
> >The explanation of why triples involving plain literals are false is
> >incomplete, as plain literals do not have to denote character strings.
> 
> Changed to "containing a plain literal."

I don't think that this does the trick.  The point is that plain literals
include those with language tags, which do not denote strings.

> >Silliness:
> >
> >rdf-interpretations do not just ``impose extra semantic conditions on crdfV
> >and typed literals with the type rdf:XMLLiteral''.  Why not just say that
> >rdf-interpretations impose extra semantic conditions?
> 
> Because this draws attention to the fact that they do not impose any 
> extra conditions on the rest of the RDF vocabulary.

Well, sort of, but I consider the use of crdfV misleading.

It is true that there are rdf-interpretations that do not impose conditions
on (the denotation of) rdf:subject.  However, any rdf-interpretation that
includes rdf:subject in its vocabulary does impose conditions on (the
denotation of) rdf:subject.  Further, not all rdf-interpretations impose
conditions on every typed literal with the type rdf:XMLLiteral, as not all
such literals need be in the vocabulary of the interpretation.

So I suggest some different wording here.


> >Problem:
> >
> >The document states several times that it is agnostic as to whether XML
> >literals are strings.
> 
> The document  refers to XML values, ie whatever it is that XML literals denote.
> 
> >However, the claimed completeness of the RDF entailment
> >rules means that XML literals are not strings.
> 
> The strings in the actual XML literals themselves are strings, as 
> clearly stated several times in this and other RDF documents. 
> Whether or not an XML literal denotes a string is where the 
> agnosticism comes in.  I am not sure which of these you mean here.

[From a separate email exchange, with new comments

** Subject: Re: pfps-04
** From: pat hayes <phayes@ihmc.us>
** To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
** Date: Wed, 23 Jul 2003 21:55:41 -0500
** 
** >From: pat hayes <phayes@ihmc.us>
** >Subject: Re: pfps-04
** >Date: Wed, 23 Jul 2003 14:56:07 -0500
[...]
** >>  >There is still a mismatch between the RDF Entailment Rules, which, if
** >>  >complete, determine that XML Literals are not the same as strings
** >>
** >>  ?? How can they possible determine that? Please explain.
** >>
** >>  Pat
** >
** >[The following assumes that certain aspects of the RDF model theory will be
** >fixed up appropriately.]
** >
** >Let x be any well-typed XML literal string.  Then "xx"^^rdf:XMLLiteral,
** >where xx is the appropriate encoding of x, has the same denotation in every
** >interpretation of a vocabulary that includes "xx"^^rdf:XMLLiteral, namely
** >the canonical XML object corresponding to x.
** 
** That assumes that 'canonical XML object' is uniquely welldefined. I 
** don't think it is, as I have observed differences of opinion about 
** what exactly it is. So I don't accept that it does have the same 
** denotation in every interpretation. It denotes whatever someone 
** thinks the specs mean. Opinions seem to differ.

If this is not well defined, then it would not be possible to create an RDF
datatype for XML Literals, as RDF datatypes require a well-defined mapping
from lexical to value spaces.  

** >Let this object be x'.
** >
** >If x' is a string
** 
** Nobody said it was a string.  x is a string in a XML literal, ie a 
** syntactic entity.  x' is an XML Literal value, something in an 
** interpretation.  Whether or not those are strings is left open: as 
** far as the MT is concerned, they might not be, ie there are 
** satisfying interpretations where they are not.
** 
** Pat

Note that the above says that ``*If* x' is a string''.  This does not mean
that x' has to be a string.  However if x' is a string (of Unicode
characters), then 

[Taken from my first response to Pat.]

* If x' is a string then  
* 	ex:a ex:p x^^rdf:XMLLiteral .
* rdf-entails 
* 	ex:a ex:p "xx'" .
* where xx' is the n-triple encoding of x'.
* 
* Therefore for the RDF entailment rules to be complete, no XML Literal can
* have a character string as its denotation.


> >Problem:
> >
> >The treatment of quoted strings in LBase is so bad that I can't even begin
> >to figure it out.  However, it is definitely the case that the translation
> >to LBase changes the denotation of character strings.
> 
> Indeed there was an error in the table at this point, left over from 
> an earlier edit, my apologies.  I also see, on checking, that the 
> character-escaping convention in the published Lbase note is not in 
> fact the version I was following when writing the appendix. No wonder 
> you were unable to follow it.
> 
> Let me suggest that I simply ignore all the character-escaping 
> complexities and insert a remark in the text as follows:
> 
> "Note, these translation rules ignore issues of character escaping in 
> encoding character strings in literals: an implementation based on 
> these rules might need to use more care with strings containing the 
> characters ' and \."
> 
> The mapping now simply puts single quote marks around the literal 
> string, with no attempts at character escaping.
> 
> I have made these changes.
> 
> Bear in mind that, as the text states, this translation is provided 
> only as an informative alternative for readers who prefer this style. 
> The Lbase document emphasizes that Lbase is not intended as an 
> implementation language or for direct use as a SWEL.

As you know, I am unhappy with the presence of this appendex in the
document.  This new issue only increases my unhappiness.

> I have also weakened the claim in the 5th paragraph of section 0.1 to read :
> 
> "The translation technique offers some advantages and may be more 
> readable, so is described here as a convenience. The axiomatic 
> semantic description differs slightly from the normative model theory 
> in the body of the text, as noted in the appendix."

Does this mean that the claims of ``same semantic theory'' and ``exact
correspondence'' are gone?   If so, what then remains?

> >Whether this causes
> >problems I cannot determine.



> >Problem:
> >
> >The translation to LBase seems to assume in some places that LBase uses
> >URIrefs of some sort, e.g., the expansion of Lbase:String.  However, the
> >LBase document itself uses non-URIref names for these things, e.g., String.
> 
> Whoops. Sorry, indeed that is a mistake, arising from having too many 
> versions of the document lying around.  The 'Lbase:' prefixes should 
> not be there. Fixed.

There are Lbase: prefixes in
http://www.w3.org/TR/2003/NOTE-lbase-20030123/, which are at best a source
of confusion.

> >Problem:
> >
> >The translation to LBase ignores some of the aspects of URI references, I
> >believe.  In particular, I believe that RDF URI references can include
> >whitespace, which is not allowed in LBase names.
> 
> Really?? Well, I was unaware of that possibility, I confess. If true, 
> that would require us to change the Lbase syntax to allow for this 
> possibility. The intention was always that URIrefs could be used as 
> Lbase identifiers.

This needs to be investigated, I think.

> >  I note also that LBase
> >doesn't even bother to define character strings.
> 
> What would count as a definition? The Lbase document refers to 
> sequences of Unicode characters.

I just did a search for ``character'' in 
http://www.w3.org/TR/2003/NOTE-lbase-20030123/, which is the document
referred to by http://www.w3.org/sw/RDFCore/TR/WD-rdf-mt-20030117/, and did
not find any definition of what a character string is.

A definition would be something like
	A character string is a finite, possibly empty sequence of Unicode
	characters [ref].
The point is that character strings are only defined for some notion of
characters, and there are quite a few possibilities to choose from.

> >Problem:
> >
> >The translation to LBase can be broken by use of suitable URI references in
> >the RDF graph.
> >  For example the translation of
> >
> >	ex:a rdf:type LBase:String .
> >
> >would imply the translation of
> >
> >	ex:a rdf:type rdfs:Literal .
> >
> >which is not a valid rdfs-entailment.
> 
> The intention was that the Lbase special names cannot be generated 
> from URIrefs.
> This is fixed now, see above, since the corrected special names are 
> not legal URIs or Qnames.

The document http://www.w3.org/TR/rdf-concepts/, referred to by 
http://www.w3.org/sw/RDFCore/TR/WD-rdf-mt-20030117/ as the definition of
URIref, only requires that a URIref be a Unicode string that would produce
a valid URI under a certain encoding.  

This appears to allow for any sort of URI, including relative URIs, which
could clash with the Lbase special names.

> >Problem:
> >
> >The translation to LBase does not require the correct treatment of XML
> >literals.  XML literals are only handled in LBase translations of
> >D-interpretations.
> >
> 
> That is true, and was done so in the interests of simplicity. The 
> text notes this but only in passing. I have added a more explicit 
> note to that effect.
> 
> ".. add the axioms specified; except that the RDF translation does 
> not deal with XML typed literals, which are handled as a datatype in 
> this translation, for simplicity."
> 
> and
> 
> "The built-in datatype rdf:XMLLiteral is treated uniformly with the 
> other datatypes, later, so that the RDF translation given here is 
> strictly incomplete as it stands. "

> >Question:
> >
> >Does
> >	<ex:a> <ex:b> "a"^^xsd:string .
> >xsd-entail
> >	<ex:a> <ex:b> "a" .
> >or not.
> 
> My current understanding is that it does not. However, I agree that 
> we should get this decided clearly one way or the other.
> 
> >An answer to this questions are needed for the RDF semantics to be
> >complete.  It should also be a test case.
> 
> That sounds like it should be discussed by the WG as a whole.  I have 
> asked Brian to put this on the next agenda.

[...]
Received on Thursday, 24 July 2003 07:02:07 UTC