- From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
- Date: Wed, 12 Jun 2013 12:53:38 +0200
- To: Pat Hayes <phayes@ihmc.us>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
- CC: RDF WG <public-rdf-wg@w3.org>
Pat, Peter,
This is my review of RDF 1.1 Semantics. Sorry for sending this so late.
On the plus side, I'd say that overall, the presentation have been much
improved, interpretations being independent from a vocabulary is a big
bonus, making D-interpretations independent from the RDF vocabulary is
also much better. Putting the rules in context with the corresponding
entailment regime is also good.
Now, for the main criticism, I have two outstanding problems with the
current version:
1. D-entailment using a set rather than a mapping;
2. Define entailment of a set as entailment of the union.
1. D-entailment
===============
Concerning 1, the implication of the new definition is that given a D,
it not generally possible to know what are the valid D-entailments.
For instance, consider D = {http://example.com/dt}. What does the triple:
<s> <p> "abc"^^<http://example.com/dt> .
D-entails? The specification does not say.
Moreover, because of the absence of a known mapping from IRIs to
datatypes, there are a few ill-defined conditions:
For instance, in Section 9, the table "Semantic conditions for datatyped
literals" says:
"""
For every other IRI aaa in D, and every literal "sss"^^aaa,
IL("sss"^^aaa)=L2V(I(aaa))(sss)
"""
L2V is only defined for datatypes, whereas I(aaa) is not constrained to
be a datatype. Even though it was constrained to be a datatype, this
would not define the value of IL("sss"^^aaa), unless aaa is one of the
normative XSD datatype IRIs.
In any case, no matter how you tweak the definitions, the application
MUST have a mapping from the set of "recognised" datatype IRIs to some
specific datatypes.
Later in Section 10, it says that "if D < E and S E-entails G then S
D-entails G." Since no constraints are given on how to interpret
"recognised" non-XSD datatype IRIs, it is possible that the same IRI in
D-entailment is interpreted differently in E-entailment.
In Section 11, table "RDF semantic conditions:
"""
For every IRI aaa in D, <x,I(aaa)> is in IEXT(I(rdf:type)) if and only
if x is in the value space of I(aaa)
"""
This is ill-defined because the value space of I(aaa) may not exist.
Again, even if I(aaa) is constrained to be a datatype, how do we know
what is its value space? Therefore, the condition cannot be verified in
general.
Finally, the reasons why this change has been made are unclear. The
working group was not chartered to do anything about that, the workshop
in 2010 did not point at all to any problems with datatype maps, this
Working Group did not discuss or complained about the D being a map when
the change was made. No prior discussions were attempted before making
the change.
Implementations that rely on custom datatypes are interpreting the
custom datatype IRIs according to one specific, known datatype,
therefore, they do have a datatype *map* implemented. There is zero
motivation to make such a change.
2. using union
==============
This issue is different from the previous one because it does not make
the definitions and propositions incorrect.
I see two problems with the new definition: first, it makes the notion
of entailment in RDF different from the standard, universally accepted
notion of entailment in logic. In general, no matter what semantics is
considered entailment is defined as follows:
"""
A set S of formulas in the language entails a formula F in the same
language if and only if all interpretations that satisfy all the
formulas of S also satisfy F.
"""
That's what was in RDF 2004, that's what's in OWL, that's what's in any
logic with a model-theory.
There are also inconvenient consequences for manipulation of RDF graphs:
how is it supposed to be implemented? Assume we have two representations
of two graphs. How do you know what's the union of the two graphs? You
do not have access to the bnodes, only to identifiers or locations in
files or in memory. There is a rule of thumb saying "different
documents, different bnodes". And what about RDF graphs in an in-memory
model? What about two examples of RDF graphs in Turtle in a written
article? They are in the same document, they certainly share bnodes, right?
Now if we take the simple case when the application is able to determine
that the bnodes are disjoint, how can it perform a union? The answer is
that it must *separate apart* the bnode identifiers. So, while in 2004
there was a coherence between the way merge was defined and the way it
has to be implemented, now there is a discreprency between the
definition and the pratice.
Then, once the separation apart is made to produce a representation of
the union, the created graph is, by definition of union, sharing bnodes
with the two original graphs. But how can the overlap of bnodes be
recognised in and out of the application? One would need
meta-information about the relationship between the graphs. And how to
represent and store that relationship?
Also, if one wants to keep two graphs that share bnodes separate (say,
they are distinct graphs in the same TriG files). Then these graphs
cannot be stored separately if one wants to retain equivalent inferences
on the set of graphs. That is, if I have {G1,G2} such that G1 and G2
share some bnodes, storing G1 apart would create a "copy" of G1 with
disjoint bnodes. The new graph, H1, would be equivalent to G1, but the
set {H1,G2} would not yield the same entailments as {G1,G2}.
Finally, the decision to replace merge with union was first put into the
document without prior discussion with the Working Group, without
evidence that it follows practices, without evidence that it solves
known issues. The notion of merge was not identified as a subject of
concern during the W3C workshopin 2010. Implementations do implement the
RDF 2004 correctly.
Conclusion:
===========
More generally, any change like this is disturbing education. If this
design is standardised at the end of the year, there will be a gap
between what's in the standard and what has been written for years in
tutorials, courses, research papers, and so on.
Considering that I see no added value compared to 2004 from both these
changes, and having even identified flaws, I oppose publication of RDF
1.1 Semnatics in such a state.
Note that the solution I propose is simple and simpler than what is
proposed: to go back to the old design concerning entailment of a set of
graphs and datatype map. My proposal is also less likely to trigger
unsupportive comments in the Last Call phase. We cannot aford to spend
more time in inventing new design.
Minor remarks:
==============
I think there are too many sections. Simple interpretations and simple
entailment can be subsections of a common section. The same for
D-interpretations and D-entailment. Same for RDF interpretations and
RDF entailment; same for RDFS.
Section 3:
"""For example, RDF statements of the form:
ex:a rdfs:subClassOf owl:Thing .
are forbidden in the OWL-DL [OWL2-SYNTAX] semantic extension."""
-> This triple can be a valid part of an OWL 2 DL ontology. A better
example would be:
ex:a rdfs:subClassOf "Thing" .
Moreover, perhaps a reference to OWL 2 mapping to RDF graphs [1] would
be better, since [OWL2-SYNTAX] defines OWL 2 ontologies in terms of a
functional syntax that does not say anything about the constrains in the
RDF serialisation.
Section 4:
"A typed literal contains two names" -> We do not have the notion of
typed literals since all literals are typed.
"Two graphs are isomorphic when each maps into the other by a 1:1
mapping on blank nodes." -> this is very much underspecified. There are
other constraints on isomorphisms.
"Graphs share blank nodes ... of distinct blank nodes." -> this
discussion should not be here. In fact, it should rather appear in
Concepts. In any case, it does not belong to notation and terminology.
"This document will often treat a set of graphs as being identical to a
single graph comprising their union, without further comments." -> if my
concerns above are taken into account, this should be removed. A
definition of merge should be added instead. By the way, I haven't seen
many sets of graph being treated as a single graph. Actually, I think I
only saw it twice. So we cannot say "often".
Section 5:
Make it a subsection of "Simple semantics"? "Simple entailment"?
"a function from expressions (names, triples and graphs) to semantic
values:" -> what's a "semantic value"?
"triple s p o then ..." -> why not "triple (s, p, o)" ?
Same remark in item 4 of Section 5.2
Section 6:
Make it a subsection of "Simple semantics"? "Simple entailment"?
"a graph G simply entails a graph E when every interpretation which
satisfies G also satisfies E, and a set S of graphs simply entails E
when the union of S simply entails E" -> change this to "a set S of
graphs simply entails a graph E when every interpretation which
satisfies all graphs in S also satisfies E"
Remove the Change Note.
Section 6.1:
"the inference from (P and Q) to P, and the inference from foo(baz) to
(exists (x) foo(x))." -> the notation "(P and Q)" etc is rather obscure
in this context. Perhaps it would be good to present the usual First
Order Logic translation of the semantics. BTW, the usual FOL translation
would not be valid for entailments over a set of graphs because
{FOL(G1),FOL(G2)} is equivalent to FOL(merge(G1,G2)).
The example with ex:a ex:p _:x is confusing RDF graphs and RDF
documents, as well as bnodes and bnode identifiers. Then, while the
naive readers would intuitively imagine that taking the union of the two
triples would simply amount to putting them together, they realise that
they have to "standardise apart" the bnode identifiers.
Section 7:
"For any graph H, if sk(G) entails H then there is a graph H' such that
G entails H' and H=sk(H')" -> this should rather be: "For any graph H,
if sk(G) entails H then there is a skolemization sk'(H) of H such that G
entails sk'(H)"
Section 8:
Remove the second change note, as per my concerns above.
"datatype d refers to (denotes) the value" -> why not just say "denotes"
"L2V(d)(string)" -> rather, L2V(d)(sss)
"rdf:plainLiteral" -> "rdf:PlainLiteral"
"the datatype it refers to MUST be specified unambiguously" -> yes,
there MUST be a mapping from datatype IRIs to datatypes, i.e., there
must be a datatype map. This is a MUST, why doesn't it appear as a
constraint in the formal semantics?
Section 9:
Make it a subsection of "D-semantics"? "D-entailment"?
Section 10:
Make it a subsection of "D-semantics"? "D-entailment"?
"a set S of graphs (simply) D-entails or entails recognizing D a graph G
when every D-interpretation which makes S true also D-satisfies G." ->
"a set S of graphs (simply) D-entails a graph G when every
D-interpretation which satisfies all graphs in S also D-satisfies G."
Section 10.1:
why not put the general rule for datatype entailment:
"""
aaa uuu "xxx"^^ddd => aaa uuu "yyy"^^eee
where L2V(I(ddd))(xxx) = L2V(I(eee))(yyy)
"""
Section 11:
Make it a subsection of "RDF semantics"? "RDF entailment"?
Section 12:
Make it a subsection of "RDF semantics"? "RDF entailment"?
Section 12.1:
Group the rules together, as in Section 14.1
Section 13:
Make it a subsection of "RDFS semantics"? "RDFS entailment"?
Section 14:
Make it a subsection of "RDFS semantics"? "RDFS entailment"?
Section 15:
"plus an optional default graph" -> the default graph is not optional,
there must be exactly one
Appendix A:
"follows exactly the terms used in [OWL2-SYNTAX]" -> it is
[OWL2-PROFILES], in Section 4.3. OWL2-SYNTAX does not rely on RDF triples
"Every RDF(S) closure, even starting with the empty graph, will contain
all RDF(S) tautologies" -> not all, the closure as defined is finite,
while there are infinitely many tautologies. All tautologies concerning
the vocabulary of the initial graph, union the tautologies in the RDF
and RDFS vocabularies.
Appendix C:
The proof that every graph is satisfiable does not need introducin
Herbrand interpretation and does not need to build an interpretation for
each graph considered. There is a single interpretation that makes all
RDF graph simply true. Consider a domain comprising only one element x.
Map all IRIs and literals to x, including those used as predicates. Make
the IEXT of x be the single pair {(x,x)}. This simply satisfies all graphs.
Appendix D.1:
"The subject of a reification,/a>" -> typo
Appendix D.2:
The RDF container vocbulary should also mention rdfs:member,
rdfs:containerMembershipProperty.
--
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 12 June 2013 10:54:03 UTC