Comments on the new RDF Model Theory spec from Massimo Marchiori on 2002-05-07 (www-rdf-comments@w3.org from April to June 2002)

From: Massimo Marchiori <massimo@w3.org>
Date: Tue, 7 May 2002 19:44:29 -0400
To: www-rdf-comments@w3.org
Cc: massimo@w3.org
Message-Id: <200205072344.TAA08647@tux.w3.org>
Long flights help to have spare time, and WWW2002 was far away.... 
so here's my comments on the new MT spec 
http://www.w3.org/TR/2002/WD-rdf-mt-20020429/ 

I just read up until 3.2.2, but as time is always scarce, I'll
just send what I have so far rather than waiting ("on the fly",
almost literally....! ;)

Comments are structured this way:
First, the considered section is written between "****"'s.
Then comments occur, where the commented part of the MT
is enclosed using <quote>, 
and the nature of the comment follows:
EDITORIAL means an editorial remark
WRONG means there's something wrong
ISSUE means there's an issue
And next, the actual comment appears.

These are mostly editorial comments gathered when reading the spec, 
so no high-level architectural comments, which will come later.

Executive summary: the MT is very fine, it just needs some editorial cleaning 
and some minor technical fixes. As far as low-level tech issues, just
one (containers).

Thanx,
-M

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


**** 0.2 Graph Syntax ****

<quote>
There are well-formed graphs that cannot be described by these notations, however.) 
</quote>
EDITORIAL:
What does "well-formed" means? That's a big problem thorough all the draft, too many times terms are
used without proper definitions (or, a term is used before being defined, without reference).

<quote>
An RDF literal has three parts ( a bit, a character string, and a language tag), but we will treat them simply as character strings, since the other parts of the literal play no role in the model theory.
</quote>
EDITORIAL/WRONG:
I hope this sentence is going to change, and it's just part of this version of the draft, as of course there
has to be a formal definition of what a literal is (and, the MT is the place where it has to be!). Saying
the other parts "play no role" is confusing (and, formally, wrong), so please in any case state it better.

<quote>
Blank (unlabeled) nodes are considered to be drawn from some set of 'anonymous' entities which have no label and are unique to the graph
</quote>
EDITORIAL:
What's a "label"...?! What does it formally mean to be "unique to the graph"...?

<quote>
Finally, every arc in an RDF graph is labelled with a uriref.
</quote>
EDITORIAL:
What does this mean? We're defining the RDF graph here: and, this is not really a "graph" (so that, to solve the
problems, we define it using triples). Here instead, you're confusing the level of RDF-graph-definition, with
the level of pictorial representation. Please be clearer.

<quote>
Two RDF documents, in whatever lexical form, are syntactically equivalent if and only if they map to the same RDF graph.
</quote>
EDITORIAL/WRONG: This is a definition that is never used later, so you might consider to drop it. But if you don't,
please note that this is likely wrong as written here: this is due to the fact the syntax -> graph is a relationship
and not a map. What you mean is probably to say they map to "equivalent" RDF graphs (meaning, semantically equivalent).

<quote>
An RDF graph can then be defined as a set of triples of the form <S, P, O>, where P is a uriref, S is either a uriref or a blank node, and O is either a uriref, a blank node, or a literal
</quote>
EDITORIAL/WRONG:
This is a multiset, if as said in 0.3, arcs are never merged. On the other hand, I think you really meant set here,
and as such, the precisation in 0.3 about arcs merging should be better stated.
Moreover, you should add the finiteness condition: An RDF graph is a *finite* set (or multiset..) of triples.. etc.

<quote>
The convention that relates such a set of triples to a picture of an RDF graph can then be stated as follows. Draw one oval for each blank node and uriref, and one rectangle for each literal, which occur in either the S or O position in any triple in the set, and write each uriref or literal as the label of its shape. Then for each triple <S,P,O>, draw an arrowed line from the shape produced from S to the shape produced from O, and label it with P.
</quote>
EDITORIAL:
This is rather confusing...: "rectangles"? "ovals"? Please be clearer, and add a disclaimer that this is not a 
"graph" as usually intended in the literature anyway...

<quote>
In particular, two N-triples documents which differ only by re-naming their node identifiers will be understood to describe identical RDF graphs.
</quote>
EDITORIAL/WRONG:
This is formally wrong. Here you should not say "identical" RDF graphs, but rather, define equality on graphs where 
blank nodes renaming can occur, and then use this new equality definition when needed.

<quote>
Other RDF serializations may use other means of indicating the graph structure; for our purposes, the important syntactic property of RDF graphs is that each distinct item in an RDF graph is treated as a distinct referring entity in the graph syntax.
</quote>
EDITORIAL:
What does this formally means? First, "serialization" is not defined. Second, the rest of the paragraph makes
little sense. If we need to talk about serializations, let's give a definition and state their prpoerties.
Likely, we don't need this here, but if you feel we do, then let's do it right and clearly.


**** 0.3 Definitions ****

<quote>
The result of taking the set-union of two or more RDF graphs (i.e. sets of triples) is another graph, which we will call the merge of the graphs
</quote>
WRONG:
This is formally wrong, and contradicts what said after this sentence (the fact blank nodes are not merged).
Formally define the real merge operation (and if the case, note just the opposite of what written here,
i.e. the fact the subset relationship can not hold any more when merging).

<quote>
and that a graph is an instance of another just when every triple in the first graph is an instance of a triple in the second graph, and every triple in the second graph has an instance in the first graph. 
</quote>
WRONG:
The substitution of blank nodes must be well-defined thru the whole graph (so, respecting at least node
identities). The way you define it (triple by triple) is incorrect.



**** 1.1 Technical Notes ****

<quote>
This might seem to violate one of the axioms of standard (Zermelo-Fraenkel) set theory, the axiom of foundation, which forbids infinitely descending chains of membership
</quote>
EDITORIAL:
Think about some poor guy who's reading this and doesn't have a math degree... This is all stated without references,
so some literature pointers should be added (if you can't avoid mention it, then at least give people hooks ;)

<quote>
Interpretations which share the special meaning of a particular reserved vocabulary will be named for that vocabulary, so that we will speak of 'rdf-interpretations' and 'rdfs-interpretations', etc.
</quote>
What does "share the special meaning" mean? This is colloquial but sloppy, please be clearer.


**** 1.4 Denotations of ground graphs ****

<quote>
Notice that if the vocabulary of an RDF graph contains urirefs that are not in the vocabulary of an interpretation I - that is, if I simply does not give a semantic value to some name that is used in the graph - then these truth-conditions will always yield the value false for some triple in the graph, and hence for the graph itse
</quote>
EDITORIAL/WRONG:
This is a bit dense, and should be better rewritten. Also, is this actually part of the definition, or just a comment...?
I.e., are you also here considering cases where IR is too small wrt a graph? And if so, isn't it the case that
the formal definition of the interpretation is then just ill-defined, and therefore its def should be appropraitely
modified to handle such cases? 

<quote>
Turned around, this means that any assertion of a graph implicitly asserts that all the names in the graph actually denote something in the world
</quote>
EDITORIAL:
Where is "assertion of a graph" defined?

<quote>
Since the universe of this interpretation contains no character strings as objects, any triple with a literal object would be false.
</quote>
EDITORIAL/WRONG:
Same comment as the first one for 1.4 above.


**** 1.5 Unlabeled nodes as existential assertions ****

<quote>
1.5. Unlabeled nodes as existential assertions
</quote>
EDITORIAL:
This is the first place where the wording "unlabeled node" occur (and it is used all along
after this), while before in the doc, there's always been "blank nodes". Consistency would be better.

<quote>
Notice also that the unlabeled nodes themselves are perfectly well-defined entities with a robust notion of identity
</quote>
EDITORIAL:
"robust" notion of identity....?


**** 1.6 Comparison with formal logic ****

<quote>
For example, the graph defined in the above example translates to the logical expression (written in the extended KIF syntax defined in [Hayes&Menzel])

(exists (?y)(and (ex:a ?y ex:b)(ex:b ex:c ?y)))
</quote>

EDITORIAL:
Why use KIF syntax rather than well-known (and much more common) first order logic formalisms? 
In any case, at least, some explanation of the syntax should be given.

<quote>
The above example would then map to

(exists (?y)(and (PropertyValue ex:a ?y ex:b)(PropertyValue ex:b ex:c ?y)))
</quote>
EDITORIAL:
Same comment as above.


**** 2. Simple entailment between RDF graphs ****

<quote>
Following conventional terminology, we say that I satisfies E if I(E)=true, and that a set S of expressions (simply) entails E if every interpretation which satisfies every member of S also satisfies E.
</quote>
EDITORIAL:
Where is "expression" defined....?


**** 2. Simple entailment between RDF graphs ****

<quote>
The interpolation lemma completely characterizes simple RDF entailment in syntactic terms.
<snip/>
The existence of complete subgraph-checking algorithms also shows that RDF is decidable, i.e. there is a terminating algorithm which will determine for any finite set S and any graph E, whether or not S entails E.
</quote>
EDITORIAL:
This argument to prove decidability (above lines chopped) is correct but looks like quite an overkill for the reader.
Simple model checking suffices to prove decidability here, once noted finite-domain reasoning can be applied.

<quote>
If an RDF document is asserted, then it would be invalid to bind new values to any of its unlabeled nodes, since (by the anonymity lemmas) the resulting graph would not be entailed by the assertion.
</quote>
EDITORIAL:
What are the anonymity lemmas? They never appeared in the doc yet, and there's no forward reference.



**** 2.1 Criteria for non-entailment ****

EDITORIAL: 
Do we really need this complex subsection in the main spec, and not just as an appendix? 
It doesn't seem to give any mainstream contribution (even, it has to introduce yac (yet
another concept), lean graphs, just to prove the lemmas, which are accessory.
So, the added value in the normative main text is probably not worth the complexity this adds
to the reading.

<quote>
We emphasise again that these results apply only to simple entailment, not to the namespace entailment relationships defined in rest of the document.
</quote>
EDITORIAL: this tends to be very confusing. It'd be much better to explicitly write "simple entailment" in all 
occasions where a statement doesn't hold for all other entailments defined in the spec.


**** 3.2.1 Reification ****

<quote>
The intended interpretation of these are that a triple of the form

aaa [rdf:type] [rdf:Statement] .

is true in I just when I(aaa) is an RDF triple in some RDF document.
</quote>
EDITORIAL/WRONG: 
What does this mean? (formally, nothing...).
It'd be better rephrased or omitted.

<quote>
Let us call the node which is intended to refer to the first triple - the blank node in the second graph - the center of the reification. (This can be a blank node or a uriref.) 
</quote>
EDITORIAL:
This goes on with some formal confusion. It should just be explained what a reification 
is, using a generic node (blank or uriref). As it is now, saying the "blank node" is the 
center, and then saying it can be blank or uriref is formally confusing.


**** 3.2.2 Containers ****

<quote>
RDF does not support any entailments which could arise from re-ordering the elements of an rdf:Bag.
<snip/>
Notice that if this conclusion were valid, then the result of conjoining it to the original graph would also be a valid entailment, which would assert that both elements were in both positions. (This is an consequence of the fact that RDF is a purely assertional language.)
</quote>

ISSUE:
This amounts to drop an important functionality that is part of the normative RDFM&S spec. 
This is not documented anywhere in the RDF Issue List, cf http://www.w3.org/2000/03/rdf-tracking/ .
So, why rule out entailments like the one cited in the spec, cf
<quote>
_:xxx [rdf:type] [rdf:Bag] .
_:xxx [rdf:_1] <ex:a> .
_:xxx [rdf:_2] <ex:b> .

does not entail

_:xxx [rdf:_1] <ex:b> .
_:xxx [rdf:_2] <ex:a> .
</quote>
...??
I think DanC brought this issue of mine to RDF Core some time ago, but as said, I can't
find anything in the issues list.
Mmm, let's use the new cool W3C search feature... gotcha, here it is DanC's notice:
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0085.html
I tried to follow the thread but can't find any resolution...
Note also that similar reasoning applies to rdf:Alt .
Received on Tuesday, 7 May 2002 19:44:33 UTC