Review RDF 1.1 Semantics (ED 3rd June 2013) from Antoine Zimmermann on 2013-06-12 (public-rdf-wg@w3.org from June 2013)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Wed, 12 Jun 2013 12:53:38 +0200
To: Pat Hayes <phayes@ihmc.us>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
CC: RDF WG <public-rdf-wg@w3.org>
Message-ID: <51B85332.2040703@emse.fr>
Pat, Peter,


This is my review of RDF 1.1 Semantics. Sorry for sending this so late.
On the plus side, I'd say that overall, the presentation have been much 
improved, interpretations being independent from a vocabulary is a big 
bonus, making D-interpretations independent from the RDF vocabulary is 
also much better. Putting the rules in context with the corresponding 
entailment regime is also good.

Now, for the main criticism, I have two outstanding problems with the 
current version:
  1. D-entailment using a set rather than a mapping;
  2. Define entailment of a set as entailment of the union.


1. D-entailment
===============

Concerning 1, the implication of the new definition is that given a D, 
it not generally possible to know what are the valid D-entailments.

For instance, consider D = {http://example.com/dt}. What does the triple:

  <s> <p> "abc"^^<http://example.com/dt> .

D-entails? The specification does not say.
Moreover, because of the absence of a known mapping from IRIs to 
datatypes, there are a few ill-defined conditions:
For instance, in Section 9, the table "Semantic conditions for datatyped 
literals" says:

"""
For every other IRI aaa in D, and every literal "sss"^^aaa, 
IL("sss"^^aaa)=L2V(I(aaa))(sss)
"""

L2V is only defined for datatypes, whereas I(aaa) is not constrained to 
be a datatype. Even though it was constrained to be a datatype, this 
would not define the value of IL("sss"^^aaa), unless aaa is one of the 
normative XSD datatype IRIs.

In any case, no matter how you tweak the definitions, the application 
MUST have a mapping from the set of "recognised" datatype IRIs to some 
specific datatypes.

Later in Section 10, it says that "if D < E and S E-entails G then S 
D-entails G." Since no constraints are given on how to interpret 
"recognised" non-XSD datatype IRIs, it is possible that the same IRI in 
D-entailment is interpreted differently in E-entailment.

In Section 11, table "RDF semantic conditions:

"""
For every IRI aaa in D, <x,I(aaa)> is in IEXT(I(rdf:type)) if and only 
if x is in the value space of I(aaa)
"""

This is ill-defined because the value space of I(aaa) may not exist. 
Again, even if I(aaa) is constrained to be a datatype, how do we know 
what is its value space? Therefore, the condition cannot be verified in 
general.

Finally, the reasons why this change has been made are unclear. The 
working group was not chartered to do anything about that, the workshop 
in 2010 did not point at all to any problems with datatype maps, this 
Working Group did not discuss or complained about the D being a map when 
the change was made. No prior discussions were attempted before making 
the change.

Implementations that rely on custom datatypes are interpreting the 
custom datatype IRIs according to one specific, known datatype, 
therefore, they do have a datatype *map* implemented. There is zero 
motivation to make such a change.



2. using union
==============
This issue is different from the previous one because it does not make 
the definitions and propositions incorrect.

I see two problems with the new definition: first, it makes the notion 
of entailment in RDF different from the standard, universally accepted 
notion of entailment in logic. In general, no matter what semantics is 
considered entailment is defined as follows:

"""
A set S of formulas in the language entails a formula F in the same 
language if and only if all interpretations that satisfy all the 
formulas of S also satisfy F.
"""

That's what was in RDF 2004, that's what's in OWL, that's what's in any 
logic with a model-theory.

There are also inconvenient consequences for manipulation of RDF graphs:
how is it supposed to be implemented? Assume we have two representations 
of two graphs. How do you know what's the union of the two graphs? You 
do not have access to the bnodes, only to identifiers or locations in 
files or in memory. There is a rule of thumb saying "different 
documents, different bnodes". And what about RDF graphs in an in-memory 
model? What about two examples of RDF graphs in Turtle in a written 
article? They are in the same document, they certainly share bnodes, right?

Now if we take the simple case when the application is able to determine 
that the bnodes are disjoint, how can it perform a union? The answer is 
that it must *separate apart* the bnode identifiers. So, while in 2004 
there was a coherence between the way merge was defined and the way it 
has to be implemented, now there is a discreprency between the 
definition and the pratice.

Then, once the separation apart is made to produce a representation of 
the union, the created graph is, by definition of union, sharing bnodes 
with the two original graphs. But how can the overlap of bnodes be 
recognised in and out of the application? One would need 
meta-information about the relationship between the graphs. And how to 
represent and store that relationship?

Also, if one wants to keep two graphs that share bnodes separate (say, 
they are distinct graphs in the same TriG files). Then these graphs 
cannot be stored separately if one wants to retain equivalent inferences 
on the set of graphs. That is, if I have {G1,G2} such that G1 and G2 
share some bnodes, storing G1 apart would create a "copy" of G1 with 
disjoint bnodes. The new graph, H1, would be equivalent to G1, but the 
set {H1,G2} would not yield the same entailments as {G1,G2}.

Finally, the decision to replace merge with union was first put into the 
document without prior discussion with the Working Group, without 
evidence that it follows practices, without evidence that it solves 
known issues. The notion of merge was not identified as a subject of 
concern during the W3C workshopin 2010. Implementations do implement the 
RDF 2004 correctly.



Conclusion:
===========
More generally, any change like this is disturbing education. If this 
design is standardised at the end of the year, there will be a gap 
between what's in the standard and what has been written for years in 
tutorials, courses, research papers, and so on.

Considering that I see no added value compared to 2004 from both these 
changes, and having even identified flaws, I oppose publication of RDF 
1.1 Semnatics in such a state.

Note that the solution I propose is simple and simpler than what is 
proposed: to go back to the old design concerning entailment of a set of 
graphs and datatype map. My proposal is also less likely to trigger 
unsupportive comments in the Last Call phase. We cannot aford to spend 
more time in inventing new design.



Minor remarks:
==============
I think there are too many sections. Simple interpretations and simple 
entailment can be subsections of a common section. The same for 
D-interpretations and D-entailment.  Same for RDF interpretations and 
RDF entailment; same for RDFS.

Section 3:
"""For example, RDF statements of the form:

  ex:a  rdfs:subClassOf  owl:Thing .

are forbidden in the OWL-DL [OWL2-SYNTAX] semantic extension."""

  -> This triple can be a valid part of an OWL 2 DL ontology. A better 
example would be:

  ex:a  rdfs:subClassOf  "Thing" .

Moreover, perhaps a reference to OWL 2 mapping to RDF graphs [1] would 
be better, since [OWL2-SYNTAX] defines OWL 2 ontologies in terms of a 
functional syntax that does not say anything about the constrains in the 
RDF serialisation.

Section 4:
"A typed literal contains two names" -> We do not have the notion of 
typed literals since all literals are typed.
"Two graphs are isomorphic when each maps into the other by a 1:1 
mapping on blank nodes." -> this is very much underspecified. There are 
other constraints on isomorphisms.
"Graphs share blank nodes ... of distinct blank nodes." -> this 
discussion should not be here. In fact, it should rather appear in 
Concepts. In any case, it does not belong to notation and terminology.
"This document will often treat a set of graphs as being identical to a 
single graph comprising their union, without further comments." -> if my 
concerns above are taken into account, this should be removed. A 
definition of merge should be added instead. By the way, I haven't seen 
many sets of graph being treated as a single graph. Actually, I think I 
only saw it twice. So we cannot say "often".

Section 5:
Make it a subsection of "Simple semantics"? "Simple entailment"?
"a function from expressions (names, triples and graphs) to semantic 
values:" -> what's a "semantic value"?
"triple s p o then ..." -> why not "triple (s, p, o)" ?
Same remark in item 4 of Section 5.2

Section 6:
Make it a subsection of "Simple semantics"? "Simple entailment"?
"a graph G simply entails a graph E when every interpretation which 
satisfies G also satisfies E, and a set S of graphs simply entails E 
when the union of S simply entails E" -> change this to "a set S of 
graphs simply entails a graph E when every interpretation which 
satisfies all graphs in S also satisfies E"
Remove the Change Note.

Section 6.1:
"the inference from (P and Q) to P, and the inference from foo(baz) to 
(exists (x) foo(x))." -> the notation "(P and Q)" etc is rather obscure 
in this context. Perhaps it would be good to present the usual First 
Order Logic translation of the semantics. BTW, the usual FOL translation 
would not be valid for entailments over a set of graphs because 
{FOL(G1),FOL(G2)} is equivalent to FOL(merge(G1,G2)).
The example with ex:a ex:p _:x is confusing RDF graphs and RDF 
documents, as well as bnodes and bnode identifiers. Then, while the 
naive readers would intuitively imagine that taking the union of the two 
triples would simply amount to putting them together, they realise that 
they have to "standardise apart" the bnode identifiers.

Section 7:
"For any graph H, if sk(G) entails H then there is a graph H' such that 
G entails H' and H=sk(H')" -> this should rather be: "For any graph H, 
if sk(G) entails H then there is a skolemization sk'(H) of H such that G 
entails sk'(H)"

Section 8:
Remove the second change note, as per my concerns above.
"datatype d refers to (denotes) the value" -> why not just say "denotes"
"L2V(d)(string)" -> rather, L2V(d)(sss)
"rdf:plainLiteral" -> "rdf:PlainLiteral"
"the datatype it refers to MUST be specified unambiguously" -> yes, 
there MUST be a mapping from datatype IRIs to datatypes, i.e., there 
must be a datatype map. This is a MUST, why doesn't it appear as a 
constraint in the formal semantics?

Section 9:
Make it a subsection of "D-semantics"? "D-entailment"?

Section 10:
Make it a subsection of "D-semantics"? "D-entailment"?
"a set S of graphs (simply) D-entails or entails recognizing D a graph G 
when every D-interpretation which makes S true also D-satisfies G." -> 
"a set S of graphs (simply) D-entails a graph G when every 
D-interpretation which satisfies all graphs in S also D-satisfies G."

Section 10.1:
why not put the general rule for datatype entailment:
"""
aaa uuu "xxx"^^ddd => aaa uuu "yyy"^^eee
where L2V(I(ddd))(xxx) = L2V(I(eee))(yyy)
"""

Section 11:
Make it a subsection of "RDF semantics"? "RDF entailment"?


Section 12:
Make it a subsection of "RDF semantics"? "RDF entailment"?

Section 12.1:
Group the rules together, as in Section 14.1

Section 13:
Make it a subsection of "RDFS semantics"? "RDFS entailment"?

Section 14:
Make it a subsection of "RDFS semantics"? "RDFS entailment"?

Section 15:
"plus an optional default graph" -> the default graph is not optional, 
there must be exactly one

Appendix A:
"follows exactly the terms used in [OWL2-SYNTAX]" -> it is 
[OWL2-PROFILES], in Section 4.3. OWL2-SYNTAX does not rely on RDF triples
"Every RDF(S) closure, even starting with the empty graph, will contain 
all RDF(S) tautologies" -> not all, the closure as defined is finite, 
while there are infinitely many tautologies. All tautologies concerning 
the vocabulary of the initial graph, union the tautologies in the RDF 
and RDFS vocabularies.

Appendix C:
The proof that every graph is satisfiable does not need introducin 
Herbrand interpretation and does not need to build an interpretation for 
each graph considered. There is a single interpretation that makes all 
RDF graph simply true. Consider a domain comprising only one element x. 
Map all IRIs and literals to x, including those used as predicates. Make 
the IEXT of x be the single pair {(x,x)}. This simply satisfies all graphs.

Appendix D.1:
"The subject of a reification,/a>" -> typo

Appendix D.2:
The RDF container vocbulary should also mention rdfs:member, 
rdfs:containerMembershipProperty.
-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 12 June 2013 10:54:03 UTC