Re: Comments on Last-Call Working Draft of RDF 1.1 Semantics from Pat Hayes on 2013-12-12 (public-rdf-wg@w3.org from December 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 12 Dec 2013 00:09:20 -0800
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: Michael Schneider <schneid@fzi.de>, RDF WG <public-rdf-wg@w3.org>
Message-Id: <03F6CB0F-87E9-49B3-B821-96DDB9E28500@ihmc.us>
(I believe that the appropriate forum for this discussion should be public-rdf-wg@w3.org rather than public-rdf-comments@w3.org, so I have changed the CC line to this "internal" list.)

I am not willing to make these extensive changes to the documents. In a separate email I will outline the changes I am willing to make in response to Michael's comments. 

Further more detailed responses in-line below.

Pat


On Dec 11, 2013, at 9:23 AM, Antoine Zimmermann <antoine.zimmermann@emse.fr> wrote:

> Here is a concrete proposal for changes to be made in RDF 1.1 Concepts and RDF 1.1 Semantics.
> 
> tl;dr: concepts defines datatype maps as a mapping from some IRIs to datatypess, and introduces the terms "recognized datatype IRIs" (the domain of the datatype map) and "recognized datatypes" (the range of the datatype map).

I do not believe there is any need to change the text of Concepts. The 'datatype map' device is purely mathematical and its role is restricted to Semantics. It plays no useful explanatory or intuitive role.

> 
> Using this terminology, the modifications to semantics are surprisingly minimal. Almost all text relating to D-entailment stays the same as in RDF 1.1 Semantics CR.
> 
> However, I request that datatype map are used in the semantic condition for D-entailment.

I am afraid that that is not acceptable to me. The simplification achieved by stating the semantics in the current style is completely vitiated if we re-introduce this completely unnecessary construct into the equations, and nothing is gained by putting data maps into the equations themselves. 

> Once this is set, the other semantic conditions in other entailment regime can stay almost identifical.
> 
> I also ask Michael to review my proposal. The phrasing, if retaining the idea, can be improved.
> 
> 
> 
> Changes to concepts:
> ====================
> 
> 5.4  Datatype Maps
> 
> Datatypes are identified by IRIs.  In order to know the value of a literal, implementations should be able to associate its datatype IRI to the datatype it identifies.
> This association between IRIs and datatypes is called a <def>datatype map</def> and it is formally defined as a mapping from a set of IRIs to datatypes.
> 
> The set of IRIs in a datatype map is known as the <def>recognized datatype IRIs</def>, the datatypes associated with recognized IRIs by the datatype map are called the <def>recognized datatypes</def>, and datatype maps MUST satisfy the following conditions:
> 
> - If the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral is recognized, then it must be paired with the datatype rdf:XMLLiteral defined in this specification.
> - If the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#HTML is recognized, then it must be paired with the datatype rdf:HTML defined in this specification.
> - If the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral is recognized, then it must be paired with the datatype rdf:PlainLiteral defined in [RDF:PLAINLITERAL].
> - If the datatype IRI http://www.w3.org/2002/07/owl#rational is recognized, then it must be paired with the datatype rdf:PlainLiteral defined in [OWL 2 Structural Specification].
> - If the datatype IRI http://www.w3.org/2002/07/owl#real is recognized, then it must be paired with the datatype rdf:PlainLiteral defined in [OWL 2 Structural Specification].
> - If a datatype IRI of the form http://www.w3.org/2001/XMLSchema#xxx is recognized, then it must be paired with the RDF-compatible XSD type named xsd:xxx.
> 
> Other specifications may impose additional constraints on the datatype map, for example, require support for certain datatypes.
> 
> Implementations are free to recognize datatype IRIs that are not part of a W3C specification, in which case they SHOULD provide in their documentation the definition of the datatype to which the IRI maps to, making the datatype map explicit. However, if an implementation solely recognizes datatypes contained in XSD union {rdf:HTML, rdf:XMLLiteral, rdf:HTML, owl:rational, owl:real}, then it MAY only provide a set of recognized IRIs because the associate datatypes are constrained by this specification.
> 
> <Note:> RDF Test cases never use datatypes outside the list above, so entailment tests refer to sets of recognized IRIs without making the datatype map explicit.</Note>
> <Note:> If an implementation recognizes a datatype outside the list above, it SHOULD rely on a datatype IRI that can dereference to a specification of the associated datatype.  If it does so, then it is possible to define the datatype map of the implementation simply as a set of recognized IRIs</Node>.
> 
> 
> 
> Changes to semantics:
> =====================
> 
> List of places where the notion of "identifying" and "recognizing" is used:
> 
> In Section 4: "For example, the fact that the IRI http://www.w3.org/2001/XMLSchema#decimal is widely used as the name of a datatype described in the XML Schema document [XMLSCHEMA11-2] might be described by saying that the IRI identifies that datatype. If an IRI identifies something it may or may not refer to it in a given interpretation, depending on how the semantics is specified. For example, an IRI used as a graph name identifying a named graph in an RDF dataset may refer to something different from the graph it identifies."
> 
> --> requires no change, in my opinion
> 
> Section 5: "Semantic extensions may impose further constraints upon interpretation mappings by requiring some IRIs to refer in particular ways. For example, D-interpretations, described below, require some IRIs, understood as identifying and referring to datatypes, to have a fixed denotation."
> 
> --> requires no change because even with datatype maps, certain datatype IRIs are required to have a fixed denotation.
> 
> Section 7: "Datatypes are identified by IRIs. Interpretations will vary according to which IRIs they recognize as denoting datatypes. We describe this using a parameter D on simple interpretations, where D is the set of recognized datatype IRIs. We assume that a recognized IRI identifies a unique datatype wherever it occurs, and the semantics requires that it refers to this identified datatype. The exact mechanism by which an IRI identifies a datatype IRI is considered to be external to the semantics. RDF processors which are not able to determine which datatype is identified by an IRI cannot recognize that IRI, and should treat any literals with that IRI as their datatype IRI as unknown names."
> 
> --> change this to: "Datatypes are identified by IRIs. Interpretations will vary according to which IRIs they recognize as denoting datatypes. We describe this using a parameter D on simple interpretations, where D is a datatype map, with recognized datatype IRIs S.

The problem with this wording, which was also a problem in 2004, is that this permits two (or more) distinct D-interpetations which recognize exactly the same IRIs but interpret them with different maps. By making the dataype map (ie the interpretation of datatype IRIs) local rather than global, you have re-created this monster.  It follows then that, as RDF contains only IRIs, not anything encoding a "datatype map", there is no way for two RDF processors to agree on a common entailment regime. The newer wording tries to make it clear that the semantics *presumes* that this identification is done globally and externally to RDF itself. It does this in part by relying on the Web meaning of the term "identify". An RDF engine or system can now advertize its datatyping abilities simply by listing the datatype IRIs it recognizes. (How else could it do it, in any case? How would one publish a datatype map?) 

I would add that the entire document up to this point has only mentioned one kind of mapping on IRIs, viz. interpretation mappings. Your text follows this for the first sentence, then suddenly, for no visible reason and without explanation, introduces a new (and presumably different?) kind of IRI mapping. The reader is obliged to keep a distinction in mind at this point. Later, the careful reader will find that the only technical role of this new mapping is to be (part of) an interpretation mapping, so this distinction was not in fact necessary. Why, then, did we not simply refer to is using the interpretation mapping language all along? If X is a set of IRIs denoting, say, kinds of wine, we do not call an interpretation mapping restricted to X a "wine map". As well as being a needless tripping point for a new reader, inventing the special terminology falsely suggests that datatypes have some special, unique semantics; whereas in fact, they are simply one more kind of entity in the RDF universe, and get denoted by IRIs in the same way as everything else. 

> A recognized IRI identifies the unique datatype it maps to according to the datatype map wherever it occurs

But what if there are more than one such datatype maps involved? You cannot just ignore this possibility, because this version of the semantics has been set up, apparently deliberately, to allow it. 

But also, you are now actually mis-using the term "identify". The whole point of introducing and using this term was so that the RDF semantic conditions can  refer to, and be connected to, naming conventions on the Web that are *not* given by the RDF semantic constructions themselves. Your choice of phrasing, above, completely destroys this central point. 

> , and the semantics requires that it refers to this identified datatype.

Which one? You have yet to show why there is only one, since there can by many (infinitely many?) datatype maps.

> RDF processors that do not recognized a given IRI cannot determine which datatype is identified by this IRI, and should treat any literals with that IRI as well as their datatype IRI as unknown names."

But what if they both recognize it, but recognize it differently?

> 
> The second change note in Section 7 can be changed to:
> 
> <Change note:> RDF 1.1 introduces the notion of recognized datatype IRIs and recognized datatypes which corresponds to the set of IRIs in a datatype map and to the datatypes associated with these IRIs, respectively. The use of this notion simplifies the exposition of D-entailment.
> 
> Next paragraph: "A literal with datatype d denotes the value obtained by applying this mapping to the character string sss: L2V(d)(sss)."
> 
> --> change to "If a datatype IRI uuu is recognized in a datatype map D, then a literal with datatype IRI uuu denotes the value obtained by applying this mapping to the character string sss: L2V(D(uuu))(sss)."

No, it is silly to take an interpretation mapping, re-name it as a "datatype map", then use it where one would expect to see an interpretation mapping, to define an interpretation mapping. The semantic equations will all be stated in terms of identification and interpretation, as they are now.

> 
> Next paragraph: "RDF processors are not required to recognize any datatype IRIs other than rdf:langString and xsd:string, but when IRIs listed in Section 5 of [RDF11-CONCEPTS] are recognized, they MUST be interpreted as described there, and when the IRI rdf:PlainLiteral is recognized, it MUST be interpreted to refer to the datatype defined in [RDF-PLAIN-LITERAL]. RDF processors MAY recognize other datatype IRIs, but when other datatype IRIs are recognized, the mapping between a recognized IRI and the datatype it refers to MUST be specified unambiguously, and MUST be fixed during all RDF transformations or manipulations."
> 
> --> change to "RDF processors are not required to recognize any datatype IRIs other than rdf:langString and xsd:string, but when IRIs listed in Section 5 of [RDF11-CONCEPTS] are recognized, they MUST be interpreted as described there, and when the IRI rdf:PlainLiteral is recognized, it MUST be interpreted to refer to the datatype defined in [RDF-PLAIN-LITERAL]. RDF processors MAY recognize other datatype IRIs, but when other datatype IRIs are recognized, the datatype map associating the recognized IRI to the datatype it refers to MUST be specified unambiguously, and MUST be fixed during all RDF transformations or manipulations."

I see no point in making this change.

> 
> 2 paragraphs later: "RDF processors which fail to recognize a datatype IRI will not be able to detect some entailments which are visible to one which does."
> 
> --> (optional) rather than "which fail to recognize" simply say "which do not recognize", because it does not need to be a failure, it could be on purpose

Good point. I will make that change.

> 
> Section 7.1: "Let D be a set of IRIs identifying datatypes."
> 
> --> change to "Let D be a datatype map."

No, this misses the (important) force of the use of the term "identify", that it means some Web-enforced external naming relationship between an IRI and an entity (to which, in this case, the RDF semantics is required to conform.)

> 
> Semantic conditions: "If rdf:langString is in D, then for every language-tagged string E with lexical form sss and language tag ttt, IL(E)= < sss, ttt' >, where ttt' is ttt converted to lower case using US-ASCII rules"
> 
> --> (optional) change to "If rdf:langString is recognized in D" (but I can live with the idea that an element is included in a map if it belongs to the domain
> 
> "For every other IRI aaa in D, I(aaa) is the datatype identified by aaa, and for every literal "sss"^^aaa, IL("sss"^^aaa) = L2V(I(aaa))(sss)"
> 
> --> change to "For every other IRI aaa recognized in D, I(aaa) = D(aaa), and for every literal "sss"^^aaa, IL("sss"^^aaa) = L2V(I(aaa))(sss)"
> 
> The right hand side of the last equality could read "L2V(D(aaa))(sss)" of course.

But with no gain in clarity or usefulness, of course :-)

> 
> Section 7.2: ""
> 
> --> add a sentence: "When a datatype map D can be nonambiguously characterized with a set of IRIs (for instance, when the recognized datatype IRIs are contained in XSD IRIs) then D is reduced to a set and this specification use this as an abbreviation, such as RDF entailment recognizing {rdf:langString, xsd:string}"
> 
> After technical note: "In all of this language, 'D' is being used as a parameter to represent some set of datatype IRIs, and different D sets will yield different notions of satisfiability and entailment."
> 
> --> change to "In all of this language, 'D' is being used as a parameter to represent some mapping from datatype IRIs to datatypes, and different D maps will yield different notions of satisfiability and entailment."
> 
> The following sentence is compatible with datatype maps because a mapping is formally a set of pairs, so the empty set is a mapping, mappings can be subset of others, etc.: "The more datatypes are recognized, the stronger is the entailment, so that if D ? E and S E-entails G then S must D-entail G. Simple entailment is { }-entailment, i.e. D-entailment when D is the empty set, so if S D-entails G then S simply entails G."
> 
> --> keep as is.
> 
> Section 7.2.1: "For example, if D contains xsd:decimal then ..."
> 
> --> Again here, a datatype map does not contain datatype IRIs or datatypes strictly speaking (they only contain pairs) but I think this is acceptable abuse of notation

I agree, and propose to use this abuse of notation myself, but perhaps to a slightly different purpose. 

> , so I request no change. The following paragraphs in the section have similar harmless abuse of notation.
> 
> Section 8: "RDF-D interpretations MAY fail to recognize these datatypes."
> 
> --> (optional) replace "fail to" by "not"

The problem here is that "may not" is English idiom for "must not", which makes "MAY not" very confusing. 

> 
> Section 8.1: "When D is {rdf:langString, xsd:string} then we simply say S RDF entails E."
> 
> --> Here, given the text to be put in RDF concepts, it is ok to define the datatype map as a set when it only contains standard datatype IRIs. So no change is needed.
> 
> RDFS semantic conditions: "for every other IRI aaa in D, ICEXT(I(aaa)) is the value space of I(aaa)"
> 
> --> change to "for every other recognized IRI aaa in D, ICEXT(I(aaa)) is the value space of I(aaa)"

? What is the point of this? Do you mean to imply that D might contain IRIs that are not recognized?? I think the current wording is more correct. Similarly for the subsequent change suggestions. 

>  (Note that it could be equivalently ICEXT(D(aaa)), but due to the RDF semantic conditions, it's the same thing.
> 
> "for every IRI aaa in D, I(aaa) is in ICEXT(I(rdfs:Datatype))"
> 
> --> change to "for every recognized IRI aaa in D, I(aaa) is in ICEXT(I(rdfs:Datatype))"
> 
> Section 9.2.1, RDFS entailment patterns: "rdfs1 	any IRI aaa in D"
> 
> --> "any recognized IRI aaa in D"
> 
> 
> 
> I haven't looked at the appendices for the moment.
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 66 03
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 home
40 South Alcaniz St.            (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile (preferred)
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 12 December 2013 08:10:00 UTC