Re: review of July 15 draft of RDF Semantics document from pat hayes on 2003-07-24 (www-rdf-comments@w3.org from July to September 2003)

From: pat hayes <phayes@ihmc.us>
Date: Thu, 24 Jul 2003 01:13:04 -0500
To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
Cc: www-rdf-comments@w3.org, Brian_McBride <bwm@hplb.hpl.hp.com>
Message-Id: <p06001a00bb44af6545ea@[10.0.100.23]>
Peter, greetings and thanks for the close reading.

All the changes mentioned below are now in the version dated 23 July at

http://www.ihmc.us/users/phayes/RDF_Semant_Edit_Weak.html

Pat
---------

>As I had received information from the RDF Core WG that the RDF Semantics
>document was suitable for review, and I needed to see if my many concerns
>with the RDF model theory have been resolved, I did a pass through the July
>15 draft of RDF Semantics.
>
>Unforunately, I found quite a number of problems with this draft.  Some of
>these are problems remaining from previous versions of the document but
>some of them appear to have been newly introduced.
>
>
>Drastic Problem:
>
>The treatment of XML Literals is inconsistent within the document and with
>respect to RDF Concepts (at least the version of RDF Concepts that is
>accessible through the pointer in the RDF Semantics document, there are
>also broken links related to XML Literals).  The change list in RDF
>Semantics

The change list is not part of the document. Please review the document.

>says that XML literals ``are now required to be in canonical form
>and therefore to denote their own literal string.''  This appears to mean
>that XML literals are just a subset of character strings.  This is
>completely counter to what is said in RDF Concepts.  Section 3 of RDF
>Semantics has no mention of the fact that XML literals denote themselves.
>It also says that is ``is deliberately agnostic as to whether or not XML
>data is considered to be identical to a character string'', which is in
>direct contradiction to the wording in the change list.
>
>XML Literals have been a source of very many problems.  As they are still
>not correct, it would be much better to just dump them entirely.

That is not an option, so I will ignore this as a comment about the 
document. XML literals are fully defined in the document.

>
>Drastic Problem:
>
>There has been a significant conceptual change to simple interpretations.
>IP is not required to be a subset of IR.  This does not appear to be in
>response to any comment to the RDF Core Working Group nor to be in response
>to any problem with the RDF model theory.  This change may have
>consequences for other formalisms, including OWL, but no announcement about
>it has been made.
>

I would not describe this as a significant conceptual change, so much 
a small technical improvement to the mathematical machinery. It was 
mentioned in an informative email which you received and replied to. 
It was not made capriciously; it reflects a recent observation that 
this slight weakening of the basic (not RDF) graph model theory makes 
'layering' of the sort requested by Jeff Pan and others somewhat 
easier to achieve, since the basic model theory now allows a 
conventional first-order structure of an interpretation of a graph 
which satisfies the conventional syntactic layering: that is, if a 
URIref occurs in a graph only in predicate position, it is no longer 
required to denote something in the universe of quantification.  This 
allows the basic model theory to be more conventional, since it no 
longer requires the use of non-well-founded structures in all cases. 
The credit for this idea is due to Chris Menzel, and it arose as 
consequence of the SCL project working to eliminate the 'Horrocks 
sentences' which had different satisfiability conditions in SCL and 
FOL; this is of course closely related to the RDF/OWL layering 
issues. Using a similar device, SCL has now achieved full FOL 
compatibility.

This does not change any RDF or RDFS entailments or semantic 
conditions, since these require that IP and IR overlap on the parts 
of the RDF and RDFS vocabularies to which semantic conditions apply, 
as the text notes; and since it weakens rather than strengthens the 
conditions on simple interpretations, I do not believe that it will 
have any significant effects on OWL.  Other members of the Webont 
working group had reacted favorably to this change.  If you feel that 
there are any problems arising from this change, please say what they 
are.

>Problem:
>
>The definition of a proper instance admits a switch of blank nodes in the
>graph, e.g., replacing _:a with _:b and vice versa, as a proper instance,
>but this shouldn't be a proper instance.

It isn't a proper instance according the definition given:

"A proper instance of a graph is an instance in which a blank node is 
mapped to a name or to some other blank node in the graph, so that in 
the instance a blank node has been replaced by a name or two blank 
nodes in the graph have been identified. "

On re-reading this I see that the comma may be misleading, and have deleted it.

>This invalidates the anonymity lemma, as
>	_:a <ex:p> _:b .
>is a proper instance of itself and lean, so should not entail itself.
>
>
>Problem:
>
>The example of a lean graph is not lean, as the instance of this graph
>obtained by replacing _:x with <ex:a> is a proper instance of the graph.

It is lean according to the definition given, which refers to 
instances being proper subgraphs.

>This calls into question the entire notion of lean graphs.
>
>
>Problem:
>
>The definition of the merge of a set of graphs is inadequate.  Just which
>blank nodes of members of S are to be replaced?

given the convention described in 0.2, it doesn't matter.

>  From the definition, the
>merge of
>	_:a <ex:p> _:b .
>and
>	_:a <ex:p> _:c .
>and
>	_:b <ex:p> _:c .
>could be
>	_:a <ex:p> _:b .
>	_:a <ex:p> _:c .
>	_:e <ex:p> _:e .
>as this ``replaces blank nodes in some members of S by distinct blank
>nodes''.

The definition reads, in full:
"a set obtained by replacing blank nodes in some members of S by 
distinct blank nodes to obtain another set S' of graphs which are 
equivalent to those in S in the above sense. ",

To be quite sure of the meaning, I have added a phrase:

". a set obtained by replacing blank nodes in some members of S by 
distinct blank nodes to obtain another set S' of graphs which share 
no blank nodes and are equivalent to those in S in the above sense."

>There are other problems in the definition of the merge as well.

I am unable to respond to that.

>
>Problem:
>
>In Section 1.3 a vocabulary is defined as a ``set of URIrefs''.

It is not defined there; the text refers to such a set as being a 
vocabulary, which is correct. However it could be better worded: I 
have changed this to "set of names".

>However, in the change log and in Section 0.3, a vocabulary is supposed to
>be able to contain typed literals.

A set of URIs without typed literals is a vocabulary, however.

>
>Problem:
>
>There is no definition of a ``literal character string'' or a ``language
>tag'', used in the definition of simple interpretations.

  "literal character string"  changed to  "character string".

Language tag is used in the sense of RFC3066. I have inserted a 
reference link to the concepts document
http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Graph-Literal
which should clarify the intended meaning.

>
>Problem:
>
>It is not the case that ``any URIref which occurs both as a predicate and
>as a subject in any triple must denote something in the intersection of IR
>and IP.''

That is indeed carelessly worded. I have rephrased it more carefully:

"any URIref which occurs in a graph both as a predicate and as a 
subject or object must denote something in the intersection of IR and 
IP in any interpretation which satisfies the graph."

It also reads better if this paragraph is placed after the next 
paragraph, and I have made this editorial change.

>
>Problem:
>
>The conditions for denotations should be augmented with more conditions
>like ``if I(p) is in IP''.    I suggest adding as well ``if s, p, and o are
>in V''.

Why do you feel this is necessary? This wording has not changed in 
many versions of the document.

But since you insist, I have added the condition explicitly.

>
>Problem:
>
>The example in Section 1.4 is incomplete in that it does not define LV.

True; it is only an example. LV can be any suitable set.

>Also, IL is necessarily the empty map as there are no typed literals in the
>vocabulary of the example.

Ah, point taken. I have added "plus all typed literals with one of 
these as the type URI"

>  This makes the fourth triple false, not true.
>
>The ``oddity'' of having a typed literal denote a non literal is not ruled
>out in datatyped interpretations.

That isn't what was meant by 'oddity', but I have deleted this comment.

>
>The explanation of why triples involving plain literals are false is
>incomplete, as plain literals do not have to denote character strings.

Changed to "containing a plain literal."

>
>Silliness:
>
>rdf-interpretations do not just ``impose extra semantic conditions on crdfV
>and typed literals with the type rdf:XMLLiteral''.  Why not just say that
>rdf-interpretations impose extra semantic conditions?

Because this draws attention to the fact that they do not impose any 
extra conditions on the rest of the RDF vocabulary.

>
>Problem:
>
>The vocabulary of an interpretation contains no ``well-typed XML
>literal string''s,

The strings are inside the literals; the literals are in the vocabulary.

Ah, I see what you are referring to.  Right; the text in the second 
table box now reads:

"If "xxx"^^rdf:XMLLiteral is in V and xxx is a well-typed literal string then"

Thanks for noticing that.

>so the definition of rdf-interpretations is
>suspect, at best.  Also, there is no definition for ``well-typed XML
>literal''.

That is also a typo. The third box has been rephrased thus for 
complete clarity:

"If "xxx"^^rdf:XMLLiteral is in V and xxx is not a well-typed literal 
string then"


>
>Problem:
>
>The document states several times that it is agnostic as to whether XML
>literals are strings.

The document  refers to XML values, ie whatever it is that XML literals denote.

>However, the claimed completeness of the RDF entailment
>rules means that XML literals are not strings.

The strings in the actual XML literals themselves are strings, as 
clearly stated several times in this and other RDF documents. 
Whether or not an XML literal denotes a string is where the 
agnosticism comes in.  I am not sure which of these you mean here.

>Problem:
>
>The treatment of quoted strings in LBase is so bad that I can't even begin
>to figure it out.  However, it is definitely the case that the translation
>to LBase changes the denotation of character strings.

Indeed there was an error in the table at this point, left over from 
an earlier edit, my apologies.  I also see, on checking, that the 
character-escaping convention in the published Lbase note is not in 
fact the version I was following when writing the appendix. No wonder 
you were unable to follow it.

Let me suggest that I simply ignore all the character-escaping 
complexities and insert a remark in the text as follows:

"Note, these translation rules ignore issues of character escaping in 
encoding character strings in literals: an implementation based on 
these rules might need to use more care with strings containing the 
characters ' and \."

The mapping now simply puts single quote marks around the literal 
string, with no attempts at character escaping.

I have made these changes.

Bear in mind that, as the text states, this translation is provided 
only as an informative alternative for readers who prefer this style. 
The Lbase document emphasizes that Lbase is not intended as an 
implementation language or for direct use as a SWEL.

I have also weakened the claim in the 5th paragraph of section 0.1 to read :

"The translation technique offers some advantages and may be more 
readable, so is described here as a convenience. The axiomatic 
semantic description differs slightly from the normative model theory 
in the body of the text, as noted in the appendix."


>Whether this causes
>problems I cannot determine.
>
>
>Problem:
>
>The translation to LBase seems to assume in some places that LBase uses
>URIrefs of some sort, e.g., the expansion of Lbase:String.  However, the
>LBase document itself uses non-URIref names for these things, e.g., String.

Whoops. Sorry, indeed that is a mistake, arising from having too many 
versions of the document lying around.  The 'Lbase:' prefixes should 
not be there. Fixed.

>Problem:
>
>The translation to LBase ignores some of the aspects of URI references, I
>believe.  In particular, I believe that RDF URI references can include
>whitespace, which is not allowed in LBase names.

Really?? Well, I was unaware of that possibility, I confess. If true, 
that would require us to change the Lbase syntax to allow for this 
possibility. The intention was always that URIrefs could be used as 
Lbase identifiers.

>  I note also that LBase
>doesn't even bother to define character strings.

What would count as a definition? The Lbase document refers to 
sequences of Unicode characters.

>
>Problem:
>
>The translation to LBase can be broken by use of suitable URI references in
>the RDF graph.
>  For example the translation of
>
>	ex:a rdf:type LBase:String .
>
>would imply the translation of
>
>	ex:a rdf:type rdfs:Literal .
>
>which is not a valid rdfs-entailment.

The intention was that the Lbase special names cannot be generated 
from URIrefs.
This is fixed now, see above, since the corrected special names are 
not legal URIs or Qnames.

>
>
>Problem:
>
>The translation to LBase does not require the correct treatment of XML
>literals.  XML literals are only handled in LBase translations of
>D-interpretations.
>

That is true, and was done so in the interests of simplicity. The 
text notes this but only in passing. I have added a more explicit 
note to that effect.

".. add the axioms specified; except that the RDF translation does 
not deal with XML typed literals, which are handled as a datatype in 
this translation, for simplicity."

and

"The built-in datatype rdf:XMLLiteral is treated uniformly with the 
other datatypes, later, so that the RDF translation given here is 
strictly incomplete as it stands. "

>
>
>Question:
>
>Does
>	<ex:a> <ex:b> "a"^^xsd:string .
>xsd-entail
>	<ex:a> <ex:b> "a" .
>or not.

My current understanding is that it does not. However, I agree that 
we should get this decided clearly one way or the other.

>An answer to this questions are needed for the RDF semantics to be
>complete.  It should also be a test case.

That sounds like it should be discussed by the WG as a whole.  I have 
asked Brian to put this on the next agenda.

>
>Typos:
>
>Section 1.3	etc..
>Section 2	a set S of [graphs] (simply) entails a graph E
>

Thanks, corrected.

Pat


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 24 July 2003 02:13:09 UTC