- From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
- Date: Wed, 12 Jun 2013 12:53:38 +0200
- To: Pat Hayes <phayes@ihmc.us>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
- CC: RDF WG <public-rdf-wg@w3.org>
Pat, Peter, This is my review of RDF 1.1 Semantics. Sorry for sending this so late. On the plus side, I'd say that overall, the presentation have been much improved, interpretations being independent from a vocabulary is a big bonus, making D-interpretations independent from the RDF vocabulary is also much better. Putting the rules in context with the corresponding entailment regime is also good. Now, for the main criticism, I have two outstanding problems with the current version: 1. D-entailment using a set rather than a mapping; 2. Define entailment of a set as entailment of the union. 1. D-entailment =============== Concerning 1, the implication of the new definition is that given a D, it not generally possible to know what are the valid D-entailments. For instance, consider D = {http://example.com/dt}. What does the triple: <s> <p> "abc"^^<http://example.com/dt> . D-entails? The specification does not say. Moreover, because of the absence of a known mapping from IRIs to datatypes, there are a few ill-defined conditions: For instance, in Section 9, the table "Semantic conditions for datatyped literals" says: """ For every other IRI aaa in D, and every literal "sss"^^aaa, IL("sss"^^aaa)=L2V(I(aaa))(sss) """ L2V is only defined for datatypes, whereas I(aaa) is not constrained to be a datatype. Even though it was constrained to be a datatype, this would not define the value of IL("sss"^^aaa), unless aaa is one of the normative XSD datatype IRIs. In any case, no matter how you tweak the definitions, the application MUST have a mapping from the set of "recognised" datatype IRIs to some specific datatypes. Later in Section 10, it says that "if D < E and S E-entails G then S D-entails G." Since no constraints are given on how to interpret "recognised" non-XSD datatype IRIs, it is possible that the same IRI in D-entailment is interpreted differently in E-entailment. In Section 11, table "RDF semantic conditions: """ For every IRI aaa in D, <x,I(aaa)> is in IEXT(I(rdf:type)) if and only if x is in the value space of I(aaa) """ This is ill-defined because the value space of I(aaa) may not exist. Again, even if I(aaa) is constrained to be a datatype, how do we know what is its value space? Therefore, the condition cannot be verified in general. Finally, the reasons why this change has been made are unclear. The working group was not chartered to do anything about that, the workshop in 2010 did not point at all to any problems with datatype maps, this Working Group did not discuss or complained about the D being a map when the change was made. No prior discussions were attempted before making the change. Implementations that rely on custom datatypes are interpreting the custom datatype IRIs according to one specific, known datatype, therefore, they do have a datatype *map* implemented. There is zero motivation to make such a change. 2. using union ============== This issue is different from the previous one because it does not make the definitions and propositions incorrect. I see two problems with the new definition: first, it makes the notion of entailment in RDF different from the standard, universally accepted notion of entailment in logic. In general, no matter what semantics is considered entailment is defined as follows: """ A set S of formulas in the language entails a formula F in the same language if and only if all interpretations that satisfy all the formulas of S also satisfy F. """ That's what was in RDF 2004, that's what's in OWL, that's what's in any logic with a model-theory. There are also inconvenient consequences for manipulation of RDF graphs: how is it supposed to be implemented? Assume we have two representations of two graphs. How do you know what's the union of the two graphs? You do not have access to the bnodes, only to identifiers or locations in files or in memory. There is a rule of thumb saying "different documents, different bnodes". And what about RDF graphs in an in-memory model? What about two examples of RDF graphs in Turtle in a written article? They are in the same document, they certainly share bnodes, right? Now if we take the simple case when the application is able to determine that the bnodes are disjoint, how can it perform a union? The answer is that it must *separate apart* the bnode identifiers. So, while in 2004 there was a coherence between the way merge was defined and the way it has to be implemented, now there is a discreprency between the definition and the pratice. Then, once the separation apart is made to produce a representation of the union, the created graph is, by definition of union, sharing bnodes with the two original graphs. But how can the overlap of bnodes be recognised in and out of the application? One would need meta-information about the relationship between the graphs. And how to represent and store that relationship? Also, if one wants to keep two graphs that share bnodes separate (say, they are distinct graphs in the same TriG files). Then these graphs cannot be stored separately if one wants to retain equivalent inferences on the set of graphs. That is, if I have {G1,G2} such that G1 and G2 share some bnodes, storing G1 apart would create a "copy" of G1 with disjoint bnodes. The new graph, H1, would be equivalent to G1, but the set {H1,G2} would not yield the same entailments as {G1,G2}. Finally, the decision to replace merge with union was first put into the document without prior discussion with the Working Group, without evidence that it follows practices, without evidence that it solves known issues. The notion of merge was not identified as a subject of concern during the W3C workshopin 2010. Implementations do implement the RDF 2004 correctly. Conclusion: =========== More generally, any change like this is disturbing education. If this design is standardised at the end of the year, there will be a gap between what's in the standard and what has been written for years in tutorials, courses, research papers, and so on. Considering that I see no added value compared to 2004 from both these changes, and having even identified flaws, I oppose publication of RDF 1.1 Semnatics in such a state. Note that the solution I propose is simple and simpler than what is proposed: to go back to the old design concerning entailment of a set of graphs and datatype map. My proposal is also less likely to trigger unsupportive comments in the Last Call phase. We cannot aford to spend more time in inventing new design. Minor remarks: ============== I think there are too many sections. Simple interpretations and simple entailment can be subsections of a common section. The same for D-interpretations and D-entailment. Same for RDF interpretations and RDF entailment; same for RDFS. Section 3: """For example, RDF statements of the form: ex:a rdfs:subClassOf owl:Thing . are forbidden in the OWL-DL [OWL2-SYNTAX] semantic extension.""" -> This triple can be a valid part of an OWL 2 DL ontology. A better example would be: ex:a rdfs:subClassOf "Thing" . Moreover, perhaps a reference to OWL 2 mapping to RDF graphs [1] would be better, since [OWL2-SYNTAX] defines OWL 2 ontologies in terms of a functional syntax that does not say anything about the constrains in the RDF serialisation. Section 4: "A typed literal contains two names" -> We do not have the notion of typed literals since all literals are typed. "Two graphs are isomorphic when each maps into the other by a 1:1 mapping on blank nodes." -> this is very much underspecified. There are other constraints on isomorphisms. "Graphs share blank nodes ... of distinct blank nodes." -> this discussion should not be here. In fact, it should rather appear in Concepts. In any case, it does not belong to notation and terminology. "This document will often treat a set of graphs as being identical to a single graph comprising their union, without further comments." -> if my concerns above are taken into account, this should be removed. A definition of merge should be added instead. By the way, I haven't seen many sets of graph being treated as a single graph. Actually, I think I only saw it twice. So we cannot say "often". Section 5: Make it a subsection of "Simple semantics"? "Simple entailment"? "a function from expressions (names, triples and graphs) to semantic values:" -> what's a "semantic value"? "triple s p o then ..." -> why not "triple (s, p, o)" ? Same remark in item 4 of Section 5.2 Section 6: Make it a subsection of "Simple semantics"? "Simple entailment"? "a graph G simply entails a graph E when every interpretation which satisfies G also satisfies E, and a set S of graphs simply entails E when the union of S simply entails E" -> change this to "a set S of graphs simply entails a graph E when every interpretation which satisfies all graphs in S also satisfies E" Remove the Change Note. Section 6.1: "the inference from (P and Q) to P, and the inference from foo(baz) to (exists (x) foo(x))." -> the notation "(P and Q)" etc is rather obscure in this context. Perhaps it would be good to present the usual First Order Logic translation of the semantics. BTW, the usual FOL translation would not be valid for entailments over a set of graphs because {FOL(G1),FOL(G2)} is equivalent to FOL(merge(G1,G2)). The example with ex:a ex:p _:x is confusing RDF graphs and RDF documents, as well as bnodes and bnode identifiers. Then, while the naive readers would intuitively imagine that taking the union of the two triples would simply amount to putting them together, they realise that they have to "standardise apart" the bnode identifiers. Section 7: "For any graph H, if sk(G) entails H then there is a graph H' such that G entails H' and H=sk(H')" -> this should rather be: "For any graph H, if sk(G) entails H then there is a skolemization sk'(H) of H such that G entails sk'(H)" Section 8: Remove the second change note, as per my concerns above. "datatype d refers to (denotes) the value" -> why not just say "denotes" "L2V(d)(string)" -> rather, L2V(d)(sss) "rdf:plainLiteral" -> "rdf:PlainLiteral" "the datatype it refers to MUST be specified unambiguously" -> yes, there MUST be a mapping from datatype IRIs to datatypes, i.e., there must be a datatype map. This is a MUST, why doesn't it appear as a constraint in the formal semantics? Section 9: Make it a subsection of "D-semantics"? "D-entailment"? Section 10: Make it a subsection of "D-semantics"? "D-entailment"? "a set S of graphs (simply) D-entails or entails recognizing D a graph G when every D-interpretation which makes S true also D-satisfies G." -> "a set S of graphs (simply) D-entails a graph G when every D-interpretation which satisfies all graphs in S also D-satisfies G." Section 10.1: why not put the general rule for datatype entailment: """ aaa uuu "xxx"^^ddd => aaa uuu "yyy"^^eee where L2V(I(ddd))(xxx) = L2V(I(eee))(yyy) """ Section 11: Make it a subsection of "RDF semantics"? "RDF entailment"? Section 12: Make it a subsection of "RDF semantics"? "RDF entailment"? Section 12.1: Group the rules together, as in Section 14.1 Section 13: Make it a subsection of "RDFS semantics"? "RDFS entailment"? Section 14: Make it a subsection of "RDFS semantics"? "RDFS entailment"? Section 15: "plus an optional default graph" -> the default graph is not optional, there must be exactly one Appendix A: "follows exactly the terms used in [OWL2-SYNTAX]" -> it is [OWL2-PROFILES], in Section 4.3. OWL2-SYNTAX does not rely on RDF triples "Every RDF(S) closure, even starting with the empty graph, will contain all RDF(S) tautologies" -> not all, the closure as defined is finite, while there are infinitely many tautologies. All tautologies concerning the vocabulary of the initial graph, union the tautologies in the RDF and RDFS vocabularies. Appendix C: The proof that every graph is satisfiable does not need introducin Herbrand interpretation and does not need to build an interpretation for each graph considered. There is a single interpretation that makes all RDF graph simply true. Consider a domain comprising only one element x. Map all IRIs and literals to x, including those used as predicates. Make the IEXT of x be the single pair {(x,x)}. This simply satisfies all graphs. Appendix D.1: "The subject of a reification,/a>" -> typo Appendix D.2: The RDF container vocbulary should also mention rdfs:member, rdfs:containerMembershipProperty. -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03 Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 12 June 2013 10:54:03 UTC