- From: Pat Hayes <phayes@ihmc.us>
- Date: Sat, 7 Dec 2013 04:01:19 -0600
- To: Michael Schneider <schneid@fzi.de>
- Cc: Guus Schreiber <guus.schreiber@vu.nl>, "public-rdf-comments@w3.org Comments" <public-rdf-comments@w3.org>
Michael, let me try again to explain the relationship between the 2004 and 2013 treatments of how datatypes are identified. (I will ignore the fact that in 2004 this was done on top of RDFS rather than RDF.) This is not an official response from the RDF WG, but it might help. In 2004, the D parameter was a function from a set of IRIs to datatypes, and given the generality of the way the semantics were stated, this was an arbitrary function. Any such function defined a D-entailment regime. However, there was an additional condition applied to mappings applied to the datatype IRIs in common use, in particular the XSD datatype IRIs, that they be mapped to the datatypes named by them according the XML Schema (part 2) specification. Thus, the notion of D-interpretation allowed mathematically for a case where, for example, D('http://www.w3.org/2001/XMLSchema#int') = http://www.w3.org/2001/XMLSchema#anyURI but such pathological cases are ruled out by text in the semantics document which requires these IRIs to refer to what the XSD specs say thay should refer to. In addition, the semantic equations given there require that in a D-interpretation I, that (1) I(x)=D(x) for all IRIs x in the domain of D that is, in a D-interpretation, the D mapping is identical to the interpretation mapping on the set of datatype IRIs, ie those comprising the domain of the D mapping. This is the only actual role for the D mapping in the semantic equations, to constrain the interpretation to conform to it. Now, let us examine this carefully. How exactly are we referring to datatypes here? In fact, the only way we have available is either to use a phrase like "the datatype referred to as "int" in the XML Schema (part 2) specification document", or, less ambiguously and more compactly, to use the URI-reference naming convention specified in that very document. I used this convention myself when writing the equation displayed above, to refer to the datatype called "anyURI". And this is inevitable: the only way we have available to refer to a datatype is to use Web conventions, defined by documents and specifications external to RDF, which define what datatype a certain IRI is the name of. There are no "mappings" available to an RDF processor, other than those defined by such external specifications or conventions. Such a processor is simply presented with an IRI used in a typed literal, and it - the processor - either 'knows' what datatype this IRI is being conventionally used to denote, or it does not. If it does, then this externally defined meaning of the IRI is what it should be interpreting the IRI to mean. In other words, it should implicitly *use the Web and the external-to-RDF world* to determine what datatype is being referred to by the type IRI. And if it cannot do this - If it does not recognize the IRI as one that is mapped to an IRI by any known specification - then there are essentially no useful inferences it can make about the literal. (In this, of course, datatype IRIs are not unique. Actual RDF data is rife with IRIs which have meanings specified by means external to RDF itself, ie which are not described by the RDF specs. Some of these are defined by other specifications. Dataypes IRIs are just one more case among many others.) So, take these two observations together: the identity expressed by (1) above, and the fact that any D mapping is simply the externally defined way to interpret a set of IRIs being used to name datatypes. It seems clear that we can take 'name' in the sense of 'refer to in an interpretation' and in the sense of 'identify according to an external specification' and, well, identify them. That is exactly what (1) above asserts: in a D-interpretation, the IRIs mapped by D should be interpreted to refer in the way described by the external conventions or specifications which determine the D mapping, ie their intended interpretation as datatype IRIs. And in particular, the XSD IRIs should be interpreted as specified by the XSD specifications (which are quite clear and unambiguous.) Once that is clarified, there is no need to mention this D mapping again, since it is simply part of the interpretation mapping. And that is how we get to the way this situation is described in the 2013 specification documents. D is now simply a set of IRIs which (we presume) are assigned interpretations as datatypes by some external specification or convention. (The document uses the terminology of "identifies": they are presumed to *identify* datatypes.) D-interpretations are required to so interpret them; and then that is all that needs to be said. If you insist upon referring to this part of the interpretation mapping, then you are free to do so: as the 'change note' explains, the 2004 datatype map is simply the restriction of a D-interpretation map to the set D. You object: but what defines this mapping? It is not specified by the parameter D. And indeed, it is not specified by that alone: it is specified by that plus external Web specfications and conventions which determine what datatypes those IRIs identify. But this was always the case, in fact, even though the 2004 mode of presentation did not say it explicitly. Not all 2004 datatype maps were sensible, or even considered by RDF reasoners. Has anyone, to your knowledge, implemented any RDF engine which could have worked with a datatype map which arbitrarily permuted the IRI <-> datatype correspondence defined by any external specification? (To do with the XSD IRIs was of course explicitly ruled out by the 2004 specs, as it is by the 2013 specs, but how about some other datatypes?) As you say, when describing the 2004 model: "In our case, these datatypes d would be > somehow represented as references to the corresponding > sections in the XSD Datatypes spec, telling the > characteristic aspects of these datatypes, including > their lexical spaces, value spaces, and the mapping > from literals to values." Exactly: you presume that the mapping is defined - fixed - by a specification external to RDF. One way to describe the point being made here is to observe that the way that datatype-naming IRIs are interpreted is not, in actual fact or practice, arbitrary, but is determined by some conventions external to RDF, and which therefore appear as fixed in the actual RDF semantics. Datatype IRIs, if they can be recognized as referring to datatypes at all, must be treated as rigid identifiers in the RDF semantics. The apparent flexibility that the "datatype map" way of talking seems to afford is an illusion. RDF processors *must* rely on Web IRI naming conventions to identify datatypes, because there is, quite literally, no other option available. Either these conventions work - the case we refer to by "recognize" - or they do not; and if they fail, then there is no other recourse. The only datatype maps that anyone can possibly have access to, are those that are defined by specifications of IRI meanings. In every way specified by the actual specification documents, the 2004 and 2013 ways of describing the relationship between IRis and datatypes are completely equivalent. Both specify that a D-interpretation give a fixed interpretation to the IRIs mentioned in D. Both require that the XSD datatype IRIs be interpreted according to the XML Schema specifications, and the required RDF datatypes be given their specified meanings. For any other datatypes not mentioned in the specification explicitly, the 2004 description requires a mapping from IRIs to those datatypes; the 2013 description requires that this mapping be fixed by some external conventions or specifications which determine how the IRIs identify datatypes. Apart from the 2013 emphasis on the need for this externally supplied identification mapping from IRIs to datatypes, the two are exactly equivalent. -------- Your "destructive reading" requires you to ignore a key sentence in the text: "We assume that a recognized IRI identifies a unique datatype wherever it occurs, and the semantics requires that it refers to this identified datatype." Note that the word 'identifies' hyperlinks to this: " IRI meanings may also be determined by other constraints external to the RDF semantics; when we wish to refer to such an externally defined naming relationship, we will use the word identify and its cognates. For example, the fact that the IRI http://www.w3.org/2001/XMLSchema#decimal is widely used as the name of a datatype described in the XML Schema document [XMLSCHEMA11-2] might be described by saying that the IRI identifies that datatype. " You describe the first sentence as 'vague and confusing'. (Did you follow the link from "identifies"?) I am not sure why you find it to be vague. IRIs identify things: surely this is uncontroversial, and fairly clear? IRIs used inside typed literals are intended to identify datatypes. Again, surely this is fairly straightforward? You yourself describe this when referring to xsd:integer early in your message. Calling an IRI 'recognized' is intended to mean that it is understood as identifying a datatype. You ask: "With regard to what is uniqueness meant here?" I am not sure I understand this question. Uniqueness means there is only one of something. The assumption is that the recognized IRI identifies one datatype. Perhaps it would be clearer if we had written "unambiguously identifies a datatype" ? To my ear, however, "unambiguously identifies" sounds redundant, since it is impossible to ambiguously identify. You ask: "What is meant by "the semantics requires" something? Perhaps "the semantics assumes" would have been a better choice. Would that be clearer? We could make that edit if it would help. You say: "It is clear that from a simple set of IRIs alone, > there is no way to know what the IRI denotes, and > thus what the expected semantics of an RDF graph > with literals is meant to be under the D-X semantics. > Consequently, the documentation of D-X would have > to come up with some custom means of saying which > the IRIs in D denote. No, it assumes (or presumes, or requires) that this is fixed by some external specification. If it is not, then the IRIs in question cannot be recognized. The RDF specs do not, of course, themselves actually specify this external spectification. That is what we mean by "external". > But then, there /would/ be > the pairs of IRIs and datatypes again, essentially > at least, just in a way unsupported by the spec." Yes, indeed, as I believe I had already noted in an email response to you, the datatype map is in effect still present: it is simply a part of the interpretation mapping. But there is no need to call it out as a separate named structure, since its only role was to be what the interpretation mappings were restricted to. You write: "In the RDF 2004 spec, both the datatype IRIs and their > associated datatypes would be fixed for D-X. So for > any D-X interpretation I, the denotation of u, I(u), > equals d. In contrast, in RDF 1.1, D would contain > the IRI u instead of the pair (u,d), and, as D is a > set of recognizing IRIs, I would know that for any D-X > interpretation I, there exists /some/ datatatype d > with I(d). However, I would /not/ know what the > datatype d is, except perhaps from additional information > given in the handbook for D-X, but by means that are outside the RDF specification." The point is that your rather dismissive "except perhaps" case is in fact the only one that is possible. There is no other way to determine a datatype map. And when this is known, the *interpretation* of the datatype IRI is fixed by that specification; so we can speak of I(u) directly, and have no further need to refer to a special mapping from IRIs to datatypes. It is simply (a part of) an interpretation mapping. You also say: " SPARQL 1.1, OWL 2, and RIF, > > are reusing the original definition of RDF > datatype maps, and thus interoperability with these > standards will thus be directly affected." and "such a change > > will break formal compatibility with other existing > Semantic Web standards" I disagree. As the current document now gives a definition of the 2004 notion of datatype map (in the second Change Note in section 7), interoperability and formal compatibility will not be affected at all. In any case, no entailments are changed by the change in the way the semantics are described. Regarding the issue of simplification, let me explicitly contrast the two approaches. 2004 A new concept is defined, of a datatype map. No motivation is given for why this construct is necessary nor for why IRIs used to identify datatypes are treated differently from other IRIs. The semantic equations read as follows: if <aaa,x> is in D then I(aaa) = x if <aaa,x> is in D then ICEXT(x) is the value space of x and is a subset of LV if <aaa,x> is in D then for any typed literal "sss"^^ddd in V with I(ddd) = x , if sss is in the lexical space of x then IL("sss"^^ddd) = L2V(x)(sss), otherwise IL("sss"^^ddd) is not in LV if <aaa,x> is in D then I(aaa) is in ICEXT(I(rdfs:Datatype)) 2013 We observe (a commonplace) that IRIs may identify datatypes. D is a set of IRIs assumed to identify datatypes. The semantic conditions read as follows, omitting the special case for langtagged literals: For every IRI aaa in D, I(aaa) is the datatype identified by aaa, and for every literal "sss"^^aaa, IL("sss"^^aaa) = L2V(I(aaa))(sss) One straighforward equation; no new concepts are defined or necessary; no special ways of connecting to datatypes; denotation of type IRIs is exactly similar to that of all other IRIs. If aaa does not identify a datatype, then this condition is vacuous; which is correct. The simplification seems to me (and several other members of the WG) to be self-evident. Not only is it shorter and clearer, it is also more closely aligned to the realities of how datatypes are specified and referred to on the Web. Why did we make any changes to the document at all? Because we wanted to make it leaner, shorter and simpler to read. In fact, almost the entire text has been re-written (with the exception of some of the material now in appendices). Pat On Dec 6, 2013, at 6:07 PM, Michael Schneider <schneid@fzi.de> wrote: > Dear Pat, Dear Working Group, > > we had settled on treating ISSUE-165 during the CfI phase, > and I wanted to first create my implementation report > and find an opportunity to get more into the details of > the draft of the semantics before giving an answer > to the WG answer. Here is is my answer now. > > Before I come to replying to the particular WG answers, > I want to bring up another issue that I have found > only during the CfI phase. In my original LCWD comment, > I had only swiftly checked the precise changes concerning > datatypes; my main argument was more against the change > of the nomenclature and formal representation from a > datatype map to a set of recognizing IRIs. Now, after a > more in-depth check, I have to say that I have now also > technical problems with this change. > > Let's assume we have a semantic extension of D-RDFS, > called "D-X", with several datatype IRIs in D: > > D := { xsd:string, xsd:integer, ... } > > In the RDF 2004 spec, the analog entailment regime > would have been defined w.r.t. a datatype map D, which > would be a set of /pairs/ (u,d), where u a IRI and d > a datatype. In our case, these datatypes d would be > somehow represented as references to the corresponding > sections in the XSD Datatypes spec, telling the > characteristic aspects of these datatypes, including > their lexical spaces, value spaces, and the mapping > from literals to values. > > In the RDF 2004 spec, both the datatype IRIs and their > associated datatypes would be fixed for D-X. So for > any D-X interpretation I, the denotation of u, I(u), > equals d. In contrast, in RDF 1.1, D would contain > the IRI u instead of the pair (u,d), and, as D is a > set of recognizing IRIs, I would know that for any D-X > interpretation I, there exists /some/ datatatype d > with I(d). However, I would /not/ know what the > datatype d is, except perhaps from additional information > given in the handbook for D-X, but by means that are > outside the RDF specification. > > Concerning entailments, the way I have originally read > the new draft, was that for a given semantic extension D-X, > it is possible for a datatype IRI d in D to have different > denotations (i.e. datatypes) under different > D-X-interpretations I1 and I2, and, in fact, the actual > datatype would be completely unspecified in this reading. > This would then cancel out most datatype-related entailments > compared to RDF 2004, in which for any pair (u,d) in a > datatype map D of D-X, the denotation of u under any > D-X-interpretation I would always be defined to be the same > datatype, namely I(u) = d. > > I am sure that such a reading is not what the WG intends, > but the only sentence I could find about what > might have been intended is in Chapter 7: > > """ > We assume that a recognized IRI identifies > a unique datatype wherever it occurs, and the semantics > requires that it refers to this identified datatype. > """ > > Now, this is an extremely vage and confusing sentence, > and I have still no idea if I understand it. With regard > to what is uniqueness meant here? What is meant by > "the semantics requires" something? The sentence should > probably simply be dropped. But then, nothing else is > being said about the datatypes associated to the > "recognizing" IRIs, and this would then, of course, > bring back my destructive reading above. > > So, in my original reading, by replacing datatype maps > with sets of recognized IRIs, half of the required > information has been lost, or at least, the explicit > support by the specification has been removed. > It is clear that from a simple set of IRIs alone, > there is no way to know what the IRI denotes, and > thus what the expected semantics of an RDF graph > with literals is meant to be under the D-X semantics. > Consequently, the documentation of D-X would have > to come up with some custom means of saying which > the IRIs in D denote. But then, there /would/ be > the pairs of IRIs and datatypes again, essentially > at least, just in a way unsupported by the spec. > I don't believe that it was really the intention > of the WG to support such a source of confusion. > > > So far for the new point. Now to the particular > WG answers (quoted by >), where I will come back to > this and my original argument again. > > > > Regarding ISSUE-165, this matter was debated > > extensively within the WG, and most of your > > points were made during this discussion. > > (see http://lists.w3.org/Archives/Public/public-rdf-wg/2013Jun/0085.html > > and subsequent threads.) > > First to say, I do not see in the cited mail > exchange any discussion about my original argument > that at least three other core Semantic Web > standards, namely SPARQL 1.1, OWL 2, and RIF, > are reusing the original definition of RDF > datatype maps, and thus interoperability with these > standards will thus be directly affected. If you > make the change in the RDF spec, then the current > versions of these other specs will be bound to the > old version of the RDF standard and will be formally > incompatible with the current one. > > Even if the revised definition of datatype maps > is intended to "mean basically the same thing", > the other specifications will still be incompatible > with the new definition in a strictly technical sense: > They use a different formal representation and a > different nomenclature for the associations > of IRIs and their denoted datatypes, and so > one will always have to explain the translation > between the two formalism. And when the time > comes for new revisions of these other specs, > it has to be decided by these other WGs to either > follow the new approach, or to stick with the old > one. From a pov of the whole Semantic Web, the > first option is of course what should be done, > so, in essence, by applying this change in the > RDF spec, the RDF WG essentially forces the other > specifications into the same change as well. > Hence, the RDF WG is in high responsibility > here and should do a change only when there is > clear motivation for it, and when it can be > foreseen that the change will be easily > accepted by future WGs of the other specs. > Neither do I see any clear motivation for > the change, nor would I expect that such > future WGs will easily accept this change. > > However, I can see that my new technical point > given above had, in its essence, already been > brought up by Antoine Zimmermann in the first > point of his review cited above. As far as I > was able to follow the heated discussion there, > it goes pretty much in circles, and is more of > a series of attempts to convince the other party > of their preferences, including BIG LETTERS, > after which Antoine eventually gave in. > So this is not so much what I would normally > think of being an "extensive WG discussion". > > Anyways, what I can see as the essence of this > discussion is that you consider the change to be > semantically compatible with the old version, > and that it is meant to only b a small change. > Even if I accept this (which would require me > to have a different reading of the draft than > the one I give above), it is still the case that > you change the formal representation underlying > datatype semantics from a set of pairs of IRIs > and datatypes into a set of IRIs and some > additional text indicating the understanding of > the association between these IRIs and their > denoted datatypes. > > I do not consider this to be a small thing at all! > To me, this is comparable to changing the syntax of, > say, the assignment construct of a programming > language, from the widely used "reference=value" > style into something where you just declare the > reference, and require that these references get > their value somehow, by a means which is outside > the language spec. You may argue that you can > still write exactly the same kinds of programs > with the revised language, which may really be > the case, but to the price that any existing > software written in this language will not > compile anymore under the revised version, > any existing compiler needs to be rewritten, > same for any textbook on that language, > and all professional programmers have to > learn the new construct, wasting some of their > precious productive time. And after all, > the change would be widely considered > completely unesseary, because the old construct > worked perfectly well and was in wide use, > while the new one may even lead to confusion. > > Back to the change in RDF, if you really think > that the semantic consequences are the same and > that it is a minor change, then why the change > at all? In particular, given that such a change > will break formal compatibility with other existing > Semantic Web standards for no added value? > > > > The primary reason for the change was to simplify > > the presentation of the RDF semantics, which was > > an overarching goal of the WG. > > The primary goal of any W3C WG should be to comply > with the WG charta, which, in the case of the RDF WG, > explicitly requires that "changing the fundamentals > of the RDF Semantics" are out of scope for the WG > (Chapter 3). The scope of the RDF WG, according to > the charta, was "to extend RDF to include some of > the features that the community has identified as > both desirable and important for interoperability > based on experience with the 2004 version of the > standard, but without having a negative effect on > existing deployment efforts." Now see what you are > about to do here: You want to change a basic formal > aspect of the original RDF standard, which will > break interoperability with several other core > Semantic Web standards! > > But let's talk about your argument of simplification. > I do not agree that this change counts as a > considerable simplification at all, rather the > opposite. I originally expected that the semantic > conditions of datatype semantics, which really have > always been particularly easy to understand, would > have changed as well. But, as I found, they are > still essentially the same (modulo adjustments > to the new notion of recognizing IRIs). So what > you really only change here is to make the > original datatype map, which was a set of pairs > consisting of an IRI and a datatype, into a > set of IRIs with some additional text telling > that the IRIs have to denote their corresponding > datatypes somehow. So you have changed something > that is represented in a very standard way and > perfectly clear to understand into something that > is certainly not clearer, and to me, as I stated > above, even confusing. > > In any case, such a kind of change certainly does > not justify a deviation from what has been used > by several other Semantic Web standards. > > > > The actual mathematics has not altered, as the > > 2004 semantics required D-interpretation mappings > > to conform to the datatype map, so the datatype map > > is simply a part of (a restriction of) > > the interpretation mapping itself. > > Even if I would agree that the current draft can be > read this way, it is still the case that the formal > representation has changed, which breaks interoperability > with existing Semantic Web standards. > And again, if there is really hardly a change, > why do we need the change at all? > > > > Once this is recognized, it is clearly simpler to > > treat it in this way rather than as a separate mapping. > > It should be clear by now that I disagree with this view. > The original way was perfectly clear to me, > while the new one is at least confusing to me. > But, apart from personal preferences, even if it > really is a simplification, then the simplification > would be much too small to justify breaking > interoperability with existing standards. > > > > In addition, it had been noted by several commentors > > that the 2004 definitions allowed for 'pathological' > > D mappings, such as one which permutes the meanings > > of the XSD datatype IRIs. It was felt that > > disallowing such maps was a laudable by-product > > of the change. > > Now, this argument surprises me, and there are two answers > to this. > > Firstly, the problem cannot be that big, given the fact > that in the ca 10 years since the original RDF standard > at least three other core SW standards have been written > which reuse the original notion of datatype maps without > problems, each taking years of specing work and building > up considerable experience with these things. This provides > strong evidence to me that things are sufficiently fine > with datatype maps. > > As far as I am concerned myself, I have been responsible > for editing one of these specifications (the OWL 2 > RDF-Based Semantics), which makes heavy use of the > original definitions for datatype and datatype maps. > I have provided technical advise to the editors > of SPARQL Entailment Regimes and RIF RDF&OWL > Compatibility among other things with regard to datatype > related semantics. I have created several large test > suites, which are partially about datatype semantics. I > have created many formal proofs based on the datatype > semantics of RDF. I have spend some time thinking about > implementation of datatype semantics in the past, although > not yet implemented into my RDF Semantics reasoner. > And overall I have been working in the RDF field fulltime > continuously for the last 8 years up to the day. > But in all these years with all this gained experience > concerning RDF Semantics in general and RDF datatype > semantics in particular, I have never encountered any > serious problems with the original notion of datatype maps. > Rather, I have always found the original datatype > semantics well designed and it allowed me to do my work > decently. I would never have come to the conclusion that > anything would require a change, in particular not a > change of the kind proposed in RDF 1.1. For me, the old > saying holds that "If it ain't broke, don't fix it!" > > Secondly, whatever these unknown commenters were about, > let me say that no change of the semantics whatsoever > will save us from people doing strange or silly things > with datatypes, if they only want to. I can easily, for > example by applying owl:sameAs to two > value-space-incompatible datatype IRIs, do all kinds > of crazy things in the 2004 spec as well as in the > new draft. So the "pathological" argument is most > probably moot. > > > > We also note that this change does not alter any > > entailments. > > Again, this depends on the reading of the current > draft. In my reading, most datatype-related entailments > would be removed. In the reading according to the > discussion cited above, nothing would change > semantically. Either way, no change should be made then. > > > To summarize, even if I give in to the reading of > the current draft as stated in the cited discussion > thread, there is still the problem that a fundamental > aspect of the old RDF model-theoretic semantics > has changed concerning its nomenclature and formal > representation, which is used in the original form > by at least three other core standards of the > Semantic Web. Further, even if I agree with the > reading of the WG, I do not agree that there was > any need for such a change, as the old spec was > perfectly clear and this is clearly confirmed by > its use in several other standards that have been > produced over the years, and by my own long-year > experience in the matter. I further do not agree > that the given change is a simplyfication > but, rather, I consider it to be pretty confusing. > In any case, I see no justification for this change > to break interoperability with three other > Semantic Web standards, which is, of course, > to me the most important reason to reject this > change. > > But if the WG still thinks that the change > is appropriate, there is, by no means, any urge > to apply it now, but it can still be postponed > to a later WG, which would also allow to have > more discussion, in particular with regard to > the other standards that use the original datatype > semantics. > > I therefore kindly ask the WG to revert the > change and bring back the old notion of a > datatype map consisting of pairs of IRIs and > datatypes, with the necessary adjustments > to the corresponding semantic conditions. > > > Best Regards, > Michael Schneider > ------------------------------------------------------------ IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 mobile (preferred) phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Saturday, 7 December 2013 10:01:54 UTC