Formal Objection to RDF Working Group resolution of issue 165 from Michael Schneider on 2014-02-09 (public-rdf-comments@w3.org from February 2014)

From: Michael Schneider <schneid@fzi.de>
Date: Sun, 9 Feb 2014 23:49:30 +0100
To: "public-rdf-comments@w3.org Comments" <public-rdf-comments@w3.org>, "W3C Chairs of RDF WG" <team-rdf-chairs@w3.org>, Tim Berners-Lee <timbl@w3.org>
Message-ID: <52F805FA.9020906@fzi.de>
To the director of the W3C,
to the chairs and W3C team members of the RDF Working Group,
to the members of the RDF Working Group,
and to anyone else to whom it may concern.

This is a formal objection to a change made to the semantics of
datatypes in the Proposed Recommendation of the RDF 1.1 Semantics.
The change concerns the replacement of the original concept of
a "datatype map" by the concept of a "set of recognized datatype
IRIs". I will argue that this change is largely unmotivated and
unnessesary, technically incompatible with the original concept,
questionable and even flawed, and may lead to diverse problems
for dependent Semantic Web standards and other dependent work.
My proposal will be to revert the change to the original
definition as of 2004 and to postpone further discussion of
the change to a future RDF Working Group. This formal objection
follows my reviews of earlier versions of the RDF 1.1 Semantics
and my discussions with the RDF Working Group about the same
topic, which did not lead to a satisfiable conclusion for me.

Michael Schneider,
Frankfurt am Main (Germany), 9 Febrary 2014


== Introduction ==

This is a formal objection to a change made by the RDF Working
Group to the semantics of datatypes in the RDF 1.1 Semantics
compared to the original RDF Semantics specification as
of 2004 (from now on called "RDF 2004") [01]. The formal
objection targets the Proposed Recommendation (PR) of the
RDF 1.1 Semantics [02], which still underwent some changes
compared to the previous versions of the document, and which
is now intended by the Working Group to become the final
recommendation. The formal objection follows my reviews
of earlier versions of the RDF 1.1 Semantics and my
discussions with the RDF Working Group about the same
topic [03][04], which did not lead to a satisfiable
conclusion for me [05]. I have to point out that this formal
objection is not made by an official W3C member organisation,
and none of the organisations I am affiliated with
or in some current relationship with has is involved.
Rather, the formal objection is made by me as a private person,
and as a member of the informal Semantic Web community,
who has considerably contributed to the Semantic Web initiative
in the past and has a strong background and a stake
particularly in the RDF Semantics; see the section
"About the Author" for information about me.

The change to which I formally object concerns the replacement
of the original concept of a "datatype map" in Chap. 5 of [01]
by the concept of a "set of recognized datatype IRIs"
in Chap. 7 of [02]. In the original RDF 2004 Semantics,
a datatype map has been a set of associations between datatype
IRIs (originally URI references) and datatypes. In the RDF 1.1
Semantics PR, there is now a "set of recognized datatype IRIs",
that is, only the datatype IRIs, together with the additional
requirement of the existence of a globally unique mapping
between datatype IRIs and datatypes (where this unique mapping
is not intended to be fully defined by the RDF 1.1 spec).
I will describe the chnge in more detail in Section
"Description of the Change".

I will first argue, in Section "A Non-Editorial Change", that the
change is not simply an editorial change, and will give arguments,
in Section "Missing Motivation and Necessity for the Change", why
I consider the change unmotivated and unnecessary. In Section
"Technical Consequences of the Change", I will list what I
consider the most relevant technical consequences of the change,
and will also give examples for possible practical consequences.
I will then, in Section "Consequences for dependent Semantic Web
Standards and other Work", argue that the change may have
unfortunate consequences for other existing Semantic Web Standards,
which are based on RDF, such as OWL 2, SPARQL 1.1, and RIF,
and may possibly lead to a split situation, where some of future
versions of these standards will adopt the change made in RDF
while others may not.

Finally, in Section "Conclusions and Proposal", I will summarize
my arguments and argue that the consequences to be expected from
the change are strong and undesiarable, and would not exist if
the original notion of datatype maps would have been retained.
Consequencly, I will propose to revert the change to the original
situation as of RDF 2004, and to postpone further discussion of
the change to a future RDF Working Group.


== Description of the Change ==

RDF 2004 introduced the concept of a "datatype map", "being a set
of pairs of a IRI and a datatype such that no IRI appears twice
in D" (Chap. 5 of [01]; note: in order to ease the discussion,
I use the term "IRI" everywhere, although the RDF 2004 spec used
the term "URI reference" instead.) In the current PR of the
RDF 1.1 Semantics, D is not a set of IRI-datatype pairs anymore,
but a set of datatype IRIs only (Chap. 7 of [02]). It is also
not called a "datatype map" anymore, but is now called a
"set of recognized datatype IRIs".

The RDF 1.1 Semantics further states that (a) "the semantics
presumes that a recognized IRI identifies a unique datatype
wherever it occurs", and (b) that "the exact mechanism by
which an IRI identifies a datatype is considered to be
external to the semantics" (beginning of Chap 7).
The second Change Note in Chap. 7 informally elaborates
on this statement by saying that "the current semantics
presumes that a recognized IRI identifies a unique datatype,
this IRI-to-datatype mapping is globally unique and externally
specified". In contrast, RDF 2004 did not require a globally
unique association between datatype IRIs and datatypes.
Rather, the definition of datatype maps made it possible to
have IRI-datatype associations being unique only locally with
regard to a particular datatype map D, or, likewise, locally
unique to an entailment regime that uses a particular datatype
map D.

To illustrate the difference, consider the case of a custom
definition of D-RDFS with D including a new custom datatype.
In RDF 2004, it was possible to associate the the same IRI
to one datatype in one datatype map D1 and to a different
datatype in another datatype map D2. For example, the IRI
"ex:complex" may have been associated to a datatype
representing the mathematical field of complex numbers
in one extension of RDFS, and to a datatype representing
four-dimensional composites of real numbers for the
representation of space-time events in another extension
of RDFS. Under the RDF 1.1 Semantics, which requires the
existence of a globally unique IRI-datatype association,
this will not be possible anymore (regardless what the
globally unique IRI-datatype association will look like,
which is, as cited above, not fully determined by the
RDF 1.1 standard).

In addition, some of the semantic conditions related to
the semantics of datatype have been adjusted in order to
reflect the change mentioned above on a technical level.
In general, the semantic conditions now refer to applications
of a given interpretation I to a datatype IRI aaa, "I(aaa)",
instead of referring to the associated datatype by its
reference given in the datatype map, as was done in RDF 2004.
For example, compare the second of the Semantic conditions
for datatype literals in Chap. 7 of [02] with the third of
the General semantic conditions for datatypes in Chap. 5
of [01].

To summarize, the whole change here includes:

   * a change in nomenclature:
     ("datatype map" vs. "set of recognized datatype IRIs");

   * a change in the formal representation of the objects
     under consideration: a set of IRI-datatype pairs vs.
     a set of IRIs only plus an additional globally unique
     IRI-datatype association, together with adjustments
     to the semantic conditions for datatype semantics;

   * a change to the scope of uniqueness of IRI-datatype
     associations: this scope has been local to every
     particular datatype map in RDF 2004 while being global,
     and by intention mostly undetermined, in RDF 1.1.


== A Non-Editorial Change ==

It has been argued by the Working Group that the change is of
a purely editorial nature. I would certainly not formally
object to an editorial change, but consider this a non-editorial
change. An editorial change would not change basic nomenclature
or formal or technical aspects of a specification, and all this
is the case here.

Firstly, as stated above, the change introduced a change in
nomenclature from the notion of a "datatype map" to the notion
of a "set of recognized IRIs". Secondly, there have been some
changes to the underlying formal representation, as listed above.
Thirdly, and most notably in my opinion, the change of scope of
uniqueness of IRI-datatype associations has changed. This change
does have measurable effects, as I have already pointed out
by my example above where the same IRI "ex:complex" is used
for different datatypes: this is clearly possible in RDF 2004,
but will not be possible anymore in RDF 1.1.

Another way of looking at the question whether a change to a
specification is editorial or not is to check whether existing
dependent work, such as other specifications, scientific papers,
or text books, would need to be updated in non-trivial ways
in order to be in line again with the changed specification.
For the change here, it becomes clear that dependent work
needs to be updated concerning the same things that have
changed in the RDF specification. For example, a text book
that is of a more formal nature would probably need to change
its used nomenclature from "datatype maps" to "sets of
recognized IRIs", its basic definitions from sets of pairs
IRI-datatype pairs to sets of IRIs, and would need to reflect
the change in the scope of uniqueness of the IRI-datatype
associations.

Based on these arguments, I conclude that the change is
clearly non-editorial.


== Missing Motivation and Necessity for the Change ==

A non-editorial change in a specification requires good
motivation, and this is particularly true in the case of
RDF and its semantics, for which the charta of the
RDF 1.1 Working Group [06] explicitly requires that
"changing the fundamentals of the RDF Semantics" are
out of scope for the WG (Chapter 3). Based on my arguments
given below in the text concerning technical consequences
of the change, I consider the change to be indeed a change
of the fundamentals of the RDF semantics, and thus in
conflict with the charta.

In general, the scope of the RDF WG was held deliberately
conservative. According to its charta, the scope was
"to extend RDF to include some of the features that the
community has identified as both desirable and important
for interoperability based on experience with the 2004
version of the standard, but without having a negative
effect on existing deployment efforts." However, I am not
aware of any input from outside the working group during
the past 10 years since RDF 2004 became a recommendation
that would have asked for a change of the semantics
concerning the concept of datatype maps, or would have
indicated any problems with this concept. Rather, within
the previous years, at least three other core Semantic Web
standards have been written (OWL 2, SPARQL 1.1, and RIF),
which reuse the original notion of datatype maps without
any known problems, each taking years of specification
work and building up considerable experience with these
things. I am also not aware of any discussion concerning
problems with datatype maps from either the workshop or
the questionnaire that had preceeded the initiation of
the Working Group.

As far as I am concerned myself, I have been responsible
for editing one of the mentioned dependent standards
(the OWL 2 RDF-Based Semantics), which makes heavy use of
the original definitions for datatype and datatype maps.
I have also provided some technical support (both in
private and public conversation) to the editors of
SPARQL 1.1 Entailment Regimes and RIF RDF&OWL Compatibility
with regard to the RDF semantics in general and to datatype
related semantics in particular. I have further created
several large test suites, which are to a large extent
about datatype semantics. I have created many formal proofs
based on the datatype semantics of RDF. I have spend some
time thinking about the implementation of datatype semantics,
although not yet implemented into my RDF Semantics reasoner
called Swertia. And overall I have been working in the
RDF field fulltime continuously for the last 8 years up to
the day. But in all these years with all this gained
experience concerning the RDF Semantics in general and
RDF datatype semantics in particular, I have never
encountered any serious problems with the original notion
of datatype maps. Rather, I have always found the original
datatype semantics well designed and it allowed me to do my
work decently. I would never have come to the conclusion that
anything would require a change, in particular not a change
of the kind proposed in RDF 1.1.

In fact, from my earlier discussion with the Working Group
it became apparent to me that the change was not based on
input from the outside, as was requested by the charta,
but only from within the Working Group. In the context of the
charta, this would have only be acceptable, if there was a
strong reason, such as a so far unnoticed bug. The actual
rational of the Working Group was then to simplify the current
presentation of the RDF semantics [07]. Having given my arguments
above about the complete lack of request for a change and the
much work that has been carried out without problems based on
the original definitions, it should be clear that I do not see
any reason here for any form of simplification with regard
to the original situation. But of even more relevance is
that the changes have not really "simplified" the situation,
but have rather changed the situation and introduced
significant technical problems, as I will point out in
the following section.


== Technical Consequences of the Change ==

Probably the most notable technical aspect of the change
is that it is now assumed by the RDF 1.1 Semantics that there
exists a globally unique IRI-datatype association, which is
to be applied for each set D of recognized IRIs (as an
integral part of an interpretation I). In comparison, no
such unique IRI-datatype association was assumed in RDF 2004,
but the concept of datatype maps allowed to have different
datatype maps sharing the same IRI but associated with
different datatypes. Further, the RDF 1.1 Semantics PR
does not define this globally unique IRI-datatype association,
but considers its definition to be external to the semantics,
except for a small number of datatype IRIs from the XSD
namespace. This difference has a number of considerable
technical consequences.

The first technical problem is that the change strongly reduces
the number of possible constellations of IRI-datatype associations:
In RDF 2004, for any set of IRIs i1,...,in there were, in principle,
infinitely many possible datatype maps D = { (i1,d1), ..., (in,dn) }.
In RDF 1.1, however, the associated datatypes d1,...,dn are uniquely
determined to be those from the globally unique IRI-datatype
association, which means that there is only a single such IRI-datatype
association for the given set of IRIs.

An example for a possible practical consequence, which I have
already mentioned earlier, is that of two entailment regimes
sharing the same datatype IRI "ex:complex", but associated to
different datatypes, namely the mathematical field of complex
numbers on one hand, and a set of compounds of four real numbers
to represent space-time events. In general, it should be expected
that in certain fields custom datatypes will be developed and
used, without the need to wait for an international standardisation
of a IRI. The problem here is that if such a situation of concurrent
IRI-datatype associations occurs, at least one of the entailment
regimes will not be compliant with the RDF 1.1 standard anymore,
due to the fact that the RDF 1.1 standard demands that there is
a globally unique datatype associated for any given datatype IRI.
While this will hardly stop organisations from still developing
and using their custom datatypes, the situation is annoying and
undesirable, and it could trivially be avoided by sticking with
the original concept of datatype maps from RDF 2004.

The second technical problem is that, as the RDF 1.1 Semantics PR
does neither provide nor ask for an explicit set of the globally
unique IRI-datatype association, the task of proving certain
semantic properties, such as the soundness and completeness
of reasoning algorithms or reasoning tools, may become problematic
or even impossible. For example, if we have some reasoner R that
accepts pairs of RDF graphs and outputs boolean values,
and we ask whether R is sound and complete with regard to D-RDFS,
for D including the datatype IRI "ex:complex", how can we proof
or disproof whether this semantic property holds for R or not?
As mentioned earlier, there may be more than one obvious datatypes
associated with "ex:complex", and unless we know the "right" one,
we simply cannot start proof work.

This has not been a problem in RDF 2004, where the proof work
would have been done with regard to D-RDFS having an explicitly
defined datatype map D, which would have included a reference to
the datatype associated with "ex:complex". In fact, it would have
been possible to have D1-RDFS and D2-RDFS, both including the
IRI "ex:complex" but with different associated datatypes. R would
then, perhaps, have been sound and complete w.r.t. D1-RDFS but not
w.r.t. D2-RDFS, but, in any case, the proof work would have been
possible technically and its result would be been perfectly
determined.

The third technical problem is that the assumption of the existence
of a globally unique, but completely open to an externally provided
definition, set of IRI-datatype association breaks, strictly
speeking, or at leasts "confuses" the RDF Semantics. As there are
no further limitations on the set of IRIs for which there can
be associated datatypes, there may be a datatype for
/every possible/ IRI, including every IRI defined for other
purposes by the RDF Semantics itself or elsewhere in the
Semantic Web. Hence, for any given D interpretation I and
any given IRI aaa, there exists some datatype d such that
I(aaa) = d. This horrible semantic concequence was certainly
not intended by the Working Group, but it is a consequence of
missing restrictions on the set of IRIs allowed to act as
datatype IRIs. However, I cannot imagine any meaningful constraint
on the names of datatype IRIs, so this problem will hardly be
eliminated by adding whatever constraint. Again, this problem
has not existed in RDF 2004, since there has not been such an
assumption about a globally unique but indetermined IRI-datatype
association.


== Consequences for dependent Semantic Web Standards and other Work ==

For existing Semantic Web standards that depend on the
RDF semantics and specifically on the original notion
of datatype maps, the change will mean that these standards
are not fully aligned anymore with the new version of RDF.
The most important standards that are directly affected
in this way are:

   * OWL 2, specifically the OWL 2 RDF-Based Semantics,
     which is a conservative semantic extension of
     RDF 2004 D-entailment and makes strong use of the
     original datatype semantics;

   * SPARQL 1.1, specifically the RDF 1.1 Entailment Regimes,
     which defines query results for querying on top of the
     different RDF 2004 entailment regimes, including D entailment
     and the also affected OWL 2 RDF-Based Semantics;

   * RIF, specifically the RIF RDF and OWL Compatibility spec,
     which defines RIF-X combinations, for X being any of the
     entailment regimes defined by the RDF 2004 Semantics
     and also the affected OWL 2 RDF-Based Semantics.

Notwithstanding the question whether the change leads to relevant
technical consequences, there will at least be a mismatch in
nomenclature, concepts, and formal representation. In fact, all
listed standards above explicitly refer to the definition of
datatype maps and use them for their own purpose.

For example, the OWL 2 RDF-Based Semantics, following the
definitions of OWL 2 in general, introduces a specific
minimal datatype map consisting of a required set of
IRI-datatype associations, which even include several new
datatypes that have been introduced for specifically for
OWL 2 (and in part for RIF). The OWL 2 RDF-Based Semantics
considers any reasoner that fully supports /at least/
these IRI-datatype associations as a compliant
OWL 2 RDF-Based reasoner, and allows such a reasoner
to support /arbitrary/ additional IRI-datatype associations;
which is, strictly speaking, in conflict with the idea
of a globally unique set of IRI-datatype associations.

In general, I do not consider the change here to be of a sort
that would easily and naturally be implemented in future versions
of these dependent standards. It is by far not an obvious change,
or even only a "simplification" of the original situation.
Rather, it affects several aspects such as basic nomenclature,
formal representation, and even semantic assumptions about the
form of the interpretation functions. I am even unsure whether
all future working groups for these dependent standards will
be willing to adopt the change made to RDF 1.1, as this would
probably bring little value for these other standard beyond
formal compliance with RDF 1.1, but to the expense of possibly
breaking backwards compatiblity with the original version of
this other standard, as in the case of the OWL 2 RDF-Based
Semantics. So we may eventually find ourselves in a situation,
where some of the Semantic Web standards will follow the change
taken in RDF, while other's won't. This would, of course,
be a highly unfortunate and embarrassing situation, in
particular as the situation would be perfectly easily avoided
by simply avoiding the applied change to RDF in the first place.

Similar consequences as for dependent standards are to be expected
for other existing work depending on or building on top of RDF,
such as text books on RDF or other semantic technologies,
university courses, research papers, software, etc.


== Conclusions and Proposal ==

I have argued that the current change is a non-editorial change
that leads to certain incompatibilities with RDF 2004
and generally to undesirable consequences, such as
that it restricts the flexibility of defining custom entailment
regimes, a potential lack of well-definedness in questions
such as about soundness and completeness for reasoning algorithms
and tools, and even a technically flawed semantics by implicitly
requiring any existing IRI to be interpreted as some datatype.
This may have practical consequences for the application of
the RDF standard, and may lead to issues for existing other
Semantic Web standards, up to the danger of breaking compatibility
with earlier versions of these standards, if adopted,
or alternatively to a split situation, where some future versions
of these standards will not adopt the change made to RDF.

I have further noted that none of these problems existed
for the original definition of datatype maps, and that no
other technical problems of datatype maps have been
brought up ever since from the outside to the RDF WG,
as originally required by the WG charta, although the
RDF specification, and particularly the notion of datatype
maps, has been in heavy use for a decade. In fact, the
rational for the change was essentially to only simplify
the original situation without any technical change.
As I have argued, the change /is/ technical, and has
considerable problematic consequences, while there was no
known request in the past even for simplification - a
point that I can well confirm as someone who has worked
a lot with the definition of datatype maps in the past,
including specification work, the creation of test suites,
and formal proof work.

I therefore propose to fully revert the change to the original
notion of datatype maps to the form as it appears in the
original RDF specification as of 2004. This will be a valid
operation since, as I have argued, there was nothing really
wrong with the original definitions. It will also be a
preferable operation, since existing Semantic Web standards
and other published documents will continue to be compatible
with RDF 1.1, and their future authors will not be forced
into a decision whether to follow the change in the RDF semantics,
or to stick with the old definitions, where either choice may
be leading to certain compatibility issues.

I expect that such a revert will be technically and editorially
easy, as the change is, fortunately, not very strongly entangled
with other parts of the specification, and the changes to the
semantics of datatypes are pretty straightforward.

However, I do not suggest to completely abondon the idea of the
change. As there has been much discussion on the topic within
the Working Group but essentially none outside of it, neither
before the WG has started nor during its active time, I consider
it purposeful to put the change to the list of postponed issues
to be treated by a future RDF working group. By this, the proposed
change gets the chance to become known and discussed outside the
Working Group, and in particular by future working groups of
other standards that are based on the RDF Semantics. I believe
that, given the lack of request from outside the Working Group,
there is certainly no urge of applying this change to RDF now.


== About the Author ==

I have been the editor of the W3C OWL 2 RDF-Based Semantics
specification, and have been a contributor for several of
the other core OWL 2 specification documents, including the
OWL 2 Mapping to RDF and the OWL 2 RL/RDF Rules profile.
I have contributed part of the W3C OWL 2 test suite with
a focus on RDF-based reasoning, and have also created a
much larger version of this and several other test suites
concerning RDF semantics-based reasoning (some of them yet
to be published). I have provided, in both private and public
conversation, support to the editors of the SPARQL 1.1
Entailment Regimes and the RIF RDF and OWL Compatibility
specification on topics concerned with the RDF Semantics.
I have worked in several international projects with strong
focus on semantic technologies, specifically RDF. I am also
working on a RDF reasoning system, called Swertia,
and have provided input to the RDF 1.1 Semantics CfI
based on this system.

I am currently employed by the Derivo GmbH, Germany,
which is a small company specialized in products and
services based on semantic technologies. Since May 2013,
I have been permanently working for our business partner SAP,
doing work entirely dedicated to semantic technologies,
particularly RDF, SPARQL, and OWL. I am also currently a
guest scientist at FZI Research Center for Technologies,
Germany, where I have been working in the past for more
than five years, and a doctorand at the Karlsruhe Institute
of Technology (KIT), working specifically on reasoning in
expressive extensions of the RDF Semantics.

== References ==

[01] RDF 2004 Semantics <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/>
[02] RDF 1.1 Semantics PR <http://www.w3.org/TR/2014/PR-rdf11-mt-20140109/>
[03] LCWD comment on ISSUE 165 
<http://lists.w3.org/Archives/Public/public-rdf-wg/2013Oct/0221.html>
[04] CR comment on ISSUE 165 
<http://lists.w3.org/Archives/Public/public-rdf-comments/2013Dec/0027.html>
[05] Resolution of ISSUE 165 
<http://lists.w3.org/Archives/Public/public-rdf-comments/2013Dec/0107.html>
[06] RDF WG Charter <http://www.w3.org/2011/01/rdf-wg-charter>
[07] 
<http://lists.w3.org/Archives/Public/public-rdf-comments/2013Oct/0083.html>
Received on Sunday, 9 February 2014 22:49:57 UTC