- From: Michael Schneider <schneid@fzi.de>
- Date: Sun, 9 Feb 2014 23:49:30 +0100
- To: "public-rdf-comments@w3.org Comments" <public-rdf-comments@w3.org>, "W3C Chairs of RDF WG" <team-rdf-chairs@w3.org>, Tim Berners-Lee <timbl@w3.org>
To the director of the W3C, to the chairs and W3C team members of the RDF Working Group, to the members of the RDF Working Group, and to anyone else to whom it may concern. This is a formal objection to a change made to the semantics of datatypes in the Proposed Recommendation of the RDF 1.1 Semantics. The change concerns the replacement of the original concept of a "datatype map" by the concept of a "set of recognized datatype IRIs". I will argue that this change is largely unmotivated and unnessesary, technically incompatible with the original concept, questionable and even flawed, and may lead to diverse problems for dependent Semantic Web standards and other dependent work. My proposal will be to revert the change to the original definition as of 2004 and to postpone further discussion of the change to a future RDF Working Group. This formal objection follows my reviews of earlier versions of the RDF 1.1 Semantics and my discussions with the RDF Working Group about the same topic, which did not lead to a satisfiable conclusion for me. Michael Schneider, Frankfurt am Main (Germany), 9 Febrary 2014 == Introduction == This is a formal objection to a change made by the RDF Working Group to the semantics of datatypes in the RDF 1.1 Semantics compared to the original RDF Semantics specification as of 2004 (from now on called "RDF 2004") [01]. The formal objection targets the Proposed Recommendation (PR) of the RDF 1.1 Semantics [02], which still underwent some changes compared to the previous versions of the document, and which is now intended by the Working Group to become the final recommendation. The formal objection follows my reviews of earlier versions of the RDF 1.1 Semantics and my discussions with the RDF Working Group about the same topic [03][04], which did not lead to a satisfiable conclusion for me [05]. I have to point out that this formal objection is not made by an official W3C member organisation, and none of the organisations I am affiliated with or in some current relationship with has is involved. Rather, the formal objection is made by me as a private person, and as a member of the informal Semantic Web community, who has considerably contributed to the Semantic Web initiative in the past and has a strong background and a stake particularly in the RDF Semantics; see the section "About the Author" for information about me. The change to which I formally object concerns the replacement of the original concept of a "datatype map" in Chap. 5 of [01] by the concept of a "set of recognized datatype IRIs" in Chap. 7 of [02]. In the original RDF 2004 Semantics, a datatype map has been a set of associations between datatype IRIs (originally URI references) and datatypes. In the RDF 1.1 Semantics PR, there is now a "set of recognized datatype IRIs", that is, only the datatype IRIs, together with the additional requirement of the existence of a globally unique mapping between datatype IRIs and datatypes (where this unique mapping is not intended to be fully defined by the RDF 1.1 spec). I will describe the chnge in more detail in Section "Description of the Change". I will first argue, in Section "A Non-Editorial Change", that the change is not simply an editorial change, and will give arguments, in Section "Missing Motivation and Necessity for the Change", why I consider the change unmotivated and unnecessary. In Section "Technical Consequences of the Change", I will list what I consider the most relevant technical consequences of the change, and will also give examples for possible practical consequences. I will then, in Section "Consequences for dependent Semantic Web Standards and other Work", argue that the change may have unfortunate consequences for other existing Semantic Web Standards, which are based on RDF, such as OWL 2, SPARQL 1.1, and RIF, and may possibly lead to a split situation, where some of future versions of these standards will adopt the change made in RDF while others may not. Finally, in Section "Conclusions and Proposal", I will summarize my arguments and argue that the consequences to be expected from the change are strong and undesiarable, and would not exist if the original notion of datatype maps would have been retained. Consequencly, I will propose to revert the change to the original situation as of RDF 2004, and to postpone further discussion of the change to a future RDF Working Group. == Description of the Change == RDF 2004 introduced the concept of a "datatype map", "being a set of pairs of a IRI and a datatype such that no IRI appears twice in D" (Chap. 5 of [01]; note: in order to ease the discussion, I use the term "IRI" everywhere, although the RDF 2004 spec used the term "URI reference" instead.) In the current PR of the RDF 1.1 Semantics, D is not a set of IRI-datatype pairs anymore, but a set of datatype IRIs only (Chap. 7 of [02]). It is also not called a "datatype map" anymore, but is now called a "set of recognized datatype IRIs". The RDF 1.1 Semantics further states that (a) "the semantics presumes that a recognized IRI identifies a unique datatype wherever it occurs", and (b) that "the exact mechanism by which an IRI identifies a datatype is considered to be external to the semantics" (beginning of Chap 7). The second Change Note in Chap. 7 informally elaborates on this statement by saying that "the current semantics presumes that a recognized IRI identifies a unique datatype, this IRI-to-datatype mapping is globally unique and externally specified". In contrast, RDF 2004 did not require a globally unique association between datatype IRIs and datatypes. Rather, the definition of datatype maps made it possible to have IRI-datatype associations being unique only locally with regard to a particular datatype map D, or, likewise, locally unique to an entailment regime that uses a particular datatype map D. To illustrate the difference, consider the case of a custom definition of D-RDFS with D including a new custom datatype. In RDF 2004, it was possible to associate the the same IRI to one datatype in one datatype map D1 and to a different datatype in another datatype map D2. For example, the IRI "ex:complex" may have been associated to a datatype representing the mathematical field of complex numbers in one extension of RDFS, and to a datatype representing four-dimensional composites of real numbers for the representation of space-time events in another extension of RDFS. Under the RDF 1.1 Semantics, which requires the existence of a globally unique IRI-datatype association, this will not be possible anymore (regardless what the globally unique IRI-datatype association will look like, which is, as cited above, not fully determined by the RDF 1.1 standard). In addition, some of the semantic conditions related to the semantics of datatype have been adjusted in order to reflect the change mentioned above on a technical level. In general, the semantic conditions now refer to applications of a given interpretation I to a datatype IRI aaa, "I(aaa)", instead of referring to the associated datatype by its reference given in the datatype map, as was done in RDF 2004. For example, compare the second of the Semantic conditions for datatype literals in Chap. 7 of [02] with the third of the General semantic conditions for datatypes in Chap. 5 of [01]. To summarize, the whole change here includes: * a change in nomenclature: ("datatype map" vs. "set of recognized datatype IRIs"); * a change in the formal representation of the objects under consideration: a set of IRI-datatype pairs vs. a set of IRIs only plus an additional globally unique IRI-datatype association, together with adjustments to the semantic conditions for datatype semantics; * a change to the scope of uniqueness of IRI-datatype associations: this scope has been local to every particular datatype map in RDF 2004 while being global, and by intention mostly undetermined, in RDF 1.1. == A Non-Editorial Change == It has been argued by the Working Group that the change is of a purely editorial nature. I would certainly not formally object to an editorial change, but consider this a non-editorial change. An editorial change would not change basic nomenclature or formal or technical aspects of a specification, and all this is the case here. Firstly, as stated above, the change introduced a change in nomenclature from the notion of a "datatype map" to the notion of a "set of recognized IRIs". Secondly, there have been some changes to the underlying formal representation, as listed above. Thirdly, and most notably in my opinion, the change of scope of uniqueness of IRI-datatype associations has changed. This change does have measurable effects, as I have already pointed out by my example above where the same IRI "ex:complex" is used for different datatypes: this is clearly possible in RDF 2004, but will not be possible anymore in RDF 1.1. Another way of looking at the question whether a change to a specification is editorial or not is to check whether existing dependent work, such as other specifications, scientific papers, or text books, would need to be updated in non-trivial ways in order to be in line again with the changed specification. For the change here, it becomes clear that dependent work needs to be updated concerning the same things that have changed in the RDF specification. For example, a text book that is of a more formal nature would probably need to change its used nomenclature from "datatype maps" to "sets of recognized IRIs", its basic definitions from sets of pairs IRI-datatype pairs to sets of IRIs, and would need to reflect the change in the scope of uniqueness of the IRI-datatype associations. Based on these arguments, I conclude that the change is clearly non-editorial. == Missing Motivation and Necessity for the Change == A non-editorial change in a specification requires good motivation, and this is particularly true in the case of RDF and its semantics, for which the charta of the RDF 1.1 Working Group [06] explicitly requires that "changing the fundamentals of the RDF Semantics" are out of scope for the WG (Chapter 3). Based on my arguments given below in the text concerning technical consequences of the change, I consider the change to be indeed a change of the fundamentals of the RDF semantics, and thus in conflict with the charta. In general, the scope of the RDF WG was held deliberately conservative. According to its charta, the scope was "to extend RDF to include some of the features that the community has identified as both desirable and important for interoperability based on experience with the 2004 version of the standard, but without having a negative effect on existing deployment efforts." However, I am not aware of any input from outside the working group during the past 10 years since RDF 2004 became a recommendation that would have asked for a change of the semantics concerning the concept of datatype maps, or would have indicated any problems with this concept. Rather, within the previous years, at least three other core Semantic Web standards have been written (OWL 2, SPARQL 1.1, and RIF), which reuse the original notion of datatype maps without any known problems, each taking years of specification work and building up considerable experience with these things. I am also not aware of any discussion concerning problems with datatype maps from either the workshop or the questionnaire that had preceeded the initiation of the Working Group. As far as I am concerned myself, I have been responsible for editing one of the mentioned dependent standards (the OWL 2 RDF-Based Semantics), which makes heavy use of the original definitions for datatype and datatype maps. I have also provided some technical support (both in private and public conversation) to the editors of SPARQL 1.1 Entailment Regimes and RIF RDF&OWL Compatibility with regard to the RDF semantics in general and to datatype related semantics in particular. I have further created several large test suites, which are to a large extent about datatype semantics. I have created many formal proofs based on the datatype semantics of RDF. I have spend some time thinking about the implementation of datatype semantics, although not yet implemented into my RDF Semantics reasoner called Swertia. And overall I have been working in the RDF field fulltime continuously for the last 8 years up to the day. But in all these years with all this gained experience concerning the RDF Semantics in general and RDF datatype semantics in particular, I have never encountered any serious problems with the original notion of datatype maps. Rather, I have always found the original datatype semantics well designed and it allowed me to do my work decently. I would never have come to the conclusion that anything would require a change, in particular not a change of the kind proposed in RDF 1.1. In fact, from my earlier discussion with the Working Group it became apparent to me that the change was not based on input from the outside, as was requested by the charta, but only from within the Working Group. In the context of the charta, this would have only be acceptable, if there was a strong reason, such as a so far unnoticed bug. The actual rational of the Working Group was then to simplify the current presentation of the RDF semantics [07]. Having given my arguments above about the complete lack of request for a change and the much work that has been carried out without problems based on the original definitions, it should be clear that I do not see any reason here for any form of simplification with regard to the original situation. But of even more relevance is that the changes have not really "simplified" the situation, but have rather changed the situation and introduced significant technical problems, as I will point out in the following section. == Technical Consequences of the Change == Probably the most notable technical aspect of the change is that it is now assumed by the RDF 1.1 Semantics that there exists a globally unique IRI-datatype association, which is to be applied for each set D of recognized IRIs (as an integral part of an interpretation I). In comparison, no such unique IRI-datatype association was assumed in RDF 2004, but the concept of datatype maps allowed to have different datatype maps sharing the same IRI but associated with different datatypes. Further, the RDF 1.1 Semantics PR does not define this globally unique IRI-datatype association, but considers its definition to be external to the semantics, except for a small number of datatype IRIs from the XSD namespace. This difference has a number of considerable technical consequences. The first technical problem is that the change strongly reduces the number of possible constellations of IRI-datatype associations: In RDF 2004, for any set of IRIs i1,...,in there were, in principle, infinitely many possible datatype maps D = { (i1,d1), ..., (in,dn) }. In RDF 1.1, however, the associated datatypes d1,...,dn are uniquely determined to be those from the globally unique IRI-datatype association, which means that there is only a single such IRI-datatype association for the given set of IRIs. An example for a possible practical consequence, which I have already mentioned earlier, is that of two entailment regimes sharing the same datatype IRI "ex:complex", but associated to different datatypes, namely the mathematical field of complex numbers on one hand, and a set of compounds of four real numbers to represent space-time events. In general, it should be expected that in certain fields custom datatypes will be developed and used, without the need to wait for an international standardisation of a IRI. The problem here is that if such a situation of concurrent IRI-datatype associations occurs, at least one of the entailment regimes will not be compliant with the RDF 1.1 standard anymore, due to the fact that the RDF 1.1 standard demands that there is a globally unique datatype associated for any given datatype IRI. While this will hardly stop organisations from still developing and using their custom datatypes, the situation is annoying and undesirable, and it could trivially be avoided by sticking with the original concept of datatype maps from RDF 2004. The second technical problem is that, as the RDF 1.1 Semantics PR does neither provide nor ask for an explicit set of the globally unique IRI-datatype association, the task of proving certain semantic properties, such as the soundness and completeness of reasoning algorithms or reasoning tools, may become problematic or even impossible. For example, if we have some reasoner R that accepts pairs of RDF graphs and outputs boolean values, and we ask whether R is sound and complete with regard to D-RDFS, for D including the datatype IRI "ex:complex", how can we proof or disproof whether this semantic property holds for R or not? As mentioned earlier, there may be more than one obvious datatypes associated with "ex:complex", and unless we know the "right" one, we simply cannot start proof work. This has not been a problem in RDF 2004, where the proof work would have been done with regard to D-RDFS having an explicitly defined datatype map D, which would have included a reference to the datatype associated with "ex:complex". In fact, it would have been possible to have D1-RDFS and D2-RDFS, both including the IRI "ex:complex" but with different associated datatypes. R would then, perhaps, have been sound and complete w.r.t. D1-RDFS but not w.r.t. D2-RDFS, but, in any case, the proof work would have been possible technically and its result would be been perfectly determined. The third technical problem is that the assumption of the existence of a globally unique, but completely open to an externally provided definition, set of IRI-datatype association breaks, strictly speeking, or at leasts "confuses" the RDF Semantics. As there are no further limitations on the set of IRIs for which there can be associated datatypes, there may be a datatype for /every possible/ IRI, including every IRI defined for other purposes by the RDF Semantics itself or elsewhere in the Semantic Web. Hence, for any given D interpretation I and any given IRI aaa, there exists some datatype d such that I(aaa) = d. This horrible semantic concequence was certainly not intended by the Working Group, but it is a consequence of missing restrictions on the set of IRIs allowed to act as datatype IRIs. However, I cannot imagine any meaningful constraint on the names of datatype IRIs, so this problem will hardly be eliminated by adding whatever constraint. Again, this problem has not existed in RDF 2004, since there has not been such an assumption about a globally unique but indetermined IRI-datatype association. == Consequences for dependent Semantic Web Standards and other Work == For existing Semantic Web standards that depend on the RDF semantics and specifically on the original notion of datatype maps, the change will mean that these standards are not fully aligned anymore with the new version of RDF. The most important standards that are directly affected in this way are: * OWL 2, specifically the OWL 2 RDF-Based Semantics, which is a conservative semantic extension of RDF 2004 D-entailment and makes strong use of the original datatype semantics; * SPARQL 1.1, specifically the RDF 1.1 Entailment Regimes, which defines query results for querying on top of the different RDF 2004 entailment regimes, including D entailment and the also affected OWL 2 RDF-Based Semantics; * RIF, specifically the RIF RDF and OWL Compatibility spec, which defines RIF-X combinations, for X being any of the entailment regimes defined by the RDF 2004 Semantics and also the affected OWL 2 RDF-Based Semantics. Notwithstanding the question whether the change leads to relevant technical consequences, there will at least be a mismatch in nomenclature, concepts, and formal representation. In fact, all listed standards above explicitly refer to the definition of datatype maps and use them for their own purpose. For example, the OWL 2 RDF-Based Semantics, following the definitions of OWL 2 in general, introduces a specific minimal datatype map consisting of a required set of IRI-datatype associations, which even include several new datatypes that have been introduced for specifically for OWL 2 (and in part for RIF). The OWL 2 RDF-Based Semantics considers any reasoner that fully supports /at least/ these IRI-datatype associations as a compliant OWL 2 RDF-Based reasoner, and allows such a reasoner to support /arbitrary/ additional IRI-datatype associations; which is, strictly speaking, in conflict with the idea of a globally unique set of IRI-datatype associations. In general, I do not consider the change here to be of a sort that would easily and naturally be implemented in future versions of these dependent standards. It is by far not an obvious change, or even only a "simplification" of the original situation. Rather, it affects several aspects such as basic nomenclature, formal representation, and even semantic assumptions about the form of the interpretation functions. I am even unsure whether all future working groups for these dependent standards will be willing to adopt the change made to RDF 1.1, as this would probably bring little value for these other standard beyond formal compliance with RDF 1.1, but to the expense of possibly breaking backwards compatiblity with the original version of this other standard, as in the case of the OWL 2 RDF-Based Semantics. So we may eventually find ourselves in a situation, where some of the Semantic Web standards will follow the change taken in RDF, while other's won't. This would, of course, be a highly unfortunate and embarrassing situation, in particular as the situation would be perfectly easily avoided by simply avoiding the applied change to RDF in the first place. Similar consequences as for dependent standards are to be expected for other existing work depending on or building on top of RDF, such as text books on RDF or other semantic technologies, university courses, research papers, software, etc. == Conclusions and Proposal == I have argued that the current change is a non-editorial change that leads to certain incompatibilities with RDF 2004 and generally to undesirable consequences, such as that it restricts the flexibility of defining custom entailment regimes, a potential lack of well-definedness in questions such as about soundness and completeness for reasoning algorithms and tools, and even a technically flawed semantics by implicitly requiring any existing IRI to be interpreted as some datatype. This may have practical consequences for the application of the RDF standard, and may lead to issues for existing other Semantic Web standards, up to the danger of breaking compatibility with earlier versions of these standards, if adopted, or alternatively to a split situation, where some future versions of these standards will not adopt the change made to RDF. I have further noted that none of these problems existed for the original definition of datatype maps, and that no other technical problems of datatype maps have been brought up ever since from the outside to the RDF WG, as originally required by the WG charta, although the RDF specification, and particularly the notion of datatype maps, has been in heavy use for a decade. In fact, the rational for the change was essentially to only simplify the original situation without any technical change. As I have argued, the change /is/ technical, and has considerable problematic consequences, while there was no known request in the past even for simplification - a point that I can well confirm as someone who has worked a lot with the definition of datatype maps in the past, including specification work, the creation of test suites, and formal proof work. I therefore propose to fully revert the change to the original notion of datatype maps to the form as it appears in the original RDF specification as of 2004. This will be a valid operation since, as I have argued, there was nothing really wrong with the original definitions. It will also be a preferable operation, since existing Semantic Web standards and other published documents will continue to be compatible with RDF 1.1, and their future authors will not be forced into a decision whether to follow the change in the RDF semantics, or to stick with the old definitions, where either choice may be leading to certain compatibility issues. I expect that such a revert will be technically and editorially easy, as the change is, fortunately, not very strongly entangled with other parts of the specification, and the changes to the semantics of datatypes are pretty straightforward. However, I do not suggest to completely abondon the idea of the change. As there has been much discussion on the topic within the Working Group but essentially none outside of it, neither before the WG has started nor during its active time, I consider it purposeful to put the change to the list of postponed issues to be treated by a future RDF working group. By this, the proposed change gets the chance to become known and discussed outside the Working Group, and in particular by future working groups of other standards that are based on the RDF Semantics. I believe that, given the lack of request from outside the Working Group, there is certainly no urge of applying this change to RDF now. == About the Author == I have been the editor of the W3C OWL 2 RDF-Based Semantics specification, and have been a contributor for several of the other core OWL 2 specification documents, including the OWL 2 Mapping to RDF and the OWL 2 RL/RDF Rules profile. I have contributed part of the W3C OWL 2 test suite with a focus on RDF-based reasoning, and have also created a much larger version of this and several other test suites concerning RDF semantics-based reasoning (some of them yet to be published). I have provided, in both private and public conversation, support to the editors of the SPARQL 1.1 Entailment Regimes and the RIF RDF and OWL Compatibility specification on topics concerned with the RDF Semantics. I have worked in several international projects with strong focus on semantic technologies, specifically RDF. I am also working on a RDF reasoning system, called Swertia, and have provided input to the RDF 1.1 Semantics CfI based on this system. I am currently employed by the Derivo GmbH, Germany, which is a small company specialized in products and services based on semantic technologies. Since May 2013, I have been permanently working for our business partner SAP, doing work entirely dedicated to semantic technologies, particularly RDF, SPARQL, and OWL. I am also currently a guest scientist at FZI Research Center for Technologies, Germany, where I have been working in the past for more than five years, and a doctorand at the Karlsruhe Institute of Technology (KIT), working specifically on reasoning in expressive extensions of the RDF Semantics. == References == [01] RDF 2004 Semantics <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> [02] RDF 1.1 Semantics PR <http://www.w3.org/TR/2014/PR-rdf11-mt-20140109/> [03] LCWD comment on ISSUE 165 <http://lists.w3.org/Archives/Public/public-rdf-wg/2013Oct/0221.html> [04] CR comment on ISSUE 165 <http://lists.w3.org/Archives/Public/public-rdf-comments/2013Dec/0027.html> [05] Resolution of ISSUE 165 <http://lists.w3.org/Archives/Public/public-rdf-comments/2013Dec/0107.html> [06] RDF WG Charter <http://www.w3.org/2011/01/rdf-wg-charter> [07] <http://lists.w3.org/Archives/Public/public-rdf-comments/2013Oct/0083.html>
Received on Sunday, 9 February 2014 22:49:57 UTC