comments on Antoine's draft from Pat Hayes on 2013-12-03 (public-rdf-wg@w3.org from December 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 2 Dec 2013 20:51:31 -0600
To: RDF WG <public-rdf-wg@w3.org>
Message-Id: <6E98C86F-FB16-4BEE-9AB0-E829E0B15674@ihmc.us>

Basically, this is OK, but I think it can be made better. It needs to be run through a spellchecker, and I have a lot of niggling edits concerned with grammar and subtleties of expression, but first the following questions about content.

In the list of choices for graph name denotation, the cases "a container" and "information resource ..by dereferencing" both seem to be special cases of the "resource that is constrained to be in a relationship". Why list them separately? (Or maybe just list the last one as "some other resource that is constrained..." But did we ever consider any other cases than those two, in fact?)

The next list of possible meanings seems to omit the case where the default graph is understood to be metadata about the contextual named graphs, which was Sandro's main use case (and is important for Jeremy and for the PROV uses). This is not the same as saying that is a 'global context'.

section 2.1 3rd para: " Consequently, defining interpretation and entailement for RDF datasets would require at least an extension of the RDF semantics." COuld be misunderstood to mean that this requires a change to RDF semantics, which is not correct. Might be clearer to say, RDF semantics does not itself specify a meaning for <name, graph> pairs.

2.2 "In Carrol et al., a named graph is simply defined as a pair comprising an IRI and an RDF graph." But this is how you have already defined them, so what is being conveyed by the word "simply"?

2.3 The ASK-no-variables = entailment trick is clever, but why does it not apply just as well to named graphs as to the default graph? Presumably ASKing a graph directed to a name is entailment by the named graph, no? (IF not, why not? Intuitively speaking, that is.) So you should be able to get rather more traction out of this idea than you do here.

Section 3. " reuse RDF semantics as a black box" What does this mean?

"The formalisation below indicates that the truth of an RDF dataset can be determined in function of the truth of an RDF graph, no matter how the latter is determined. Therefore, instead of defining a precise definition of RDF graph interpretations and entailment, we use the more abstract notion of entailment regime. "
I find it very hard to understand the logic of this. Why would the black box lead to entailment regimes? And do you mean to imply that entailment regimes are less precise than model theory? (And if so, why would you be trying to be less precise?)

3.1 Could mention that the simplest notion of entailment is actually required by the RDF 1.1 specs, albeit informally. So all entailments need to at least support the "only if" part of this.

"an equivalent dataset" // "a logically equivalent dataset". The word "equivalent" is used in other ways, so need to be very clear.

3.2 I think this is misleading. We have formally decided that datasets are single bnode scopes, so to treat bnodes in two named graphs as distinct, ie to merge their graphs rather than take their union, is just wrong. (In fact it was this case, of combining graphs in a single dataset that motivated the idea that taking the union was more correct than taking the merge.) I suggest replacing this by a brief explanation of why it is necessary to combine graphs in a dataset by unioning rather than merging, *because* they may share bnodes.

"The main drawback of this dataset semantics is that all triples in the named graphs contribute to a global knowledge that must be consistent. "
Not obvious why this is a 'drawback'. The semantics does not require graphs to be consistent. Better to say, the dataset (with this semantics) will be inconsistent when the graphs are mutually inconsistent, so it cannot *consistently* hold mutually inconsistent information. This effectively treats all the named graphs as part of a single RDF graph, which is both a feature and a problem (depending on what you want it to do.)

3.3 "It is common to use the graph name as a way to identify the RDF graph inside the named graphs, or rather, to identify a particular occurence of the graph."
It would be good to keep these two cases - the graph vs. a token of the graph - a bit more separated and maybe talk about the difference and how it matters. This is what motivated the original Carroll +al definition of a named graph as a pair, the pair <graph, IRI> being a 'mathematical' version of the occurrence or token of the graph. I think this distinction is critical in understanding the semantic issues of naming graphs, and having it shunted off into a side remark is rather misleading. Especially as in the formal semantics, you use the same Carroll+al denoting-the-pair trick :-)

(You could mention this when you talk about the Carroll+al paper, in fact, so that the 'name denotes pair' idea is more motivated.)

Also, "as a way to refer to the RDF graph" // "as a way to identify..." , since Semantics draws this distinction carefully.

"Intutively, this semantics can be seen as quoting the RDF graphs inside the named graphs."
I don't think this is correct. The name-denotes-graph constraint is one thing, but the named graphs can still be asserted by the dataset, and that would not be like quotation at all. Quotation would be where the naming is *all* that the dataset asserts. (Later: I see that is how you define the semantics, but then there is a case you have left out, which where the named graphs are both asserted AND the naming relationship is asserted.)

I think the "Alice said" text is very confusing, because there is nothing in the dataset semantics that refers to speech or asserting. :alice {:bob :is :smart} could mean that Alice said it, or believes it, or has it written on her forehead, or that Alice was the source for this triple, etc.. In fact it could mean almost anything about Alice. I think this is better omitted.

Example 14 with the <code>entails</code> is potentially misleading, I think might be better to just stick to conventional metadata.

" the presence of blank nodes as graph names can be problematic because a named graph entails an infinity of other named graphs where only the graph name is changed to a different blank node."
I disagree. If there are n graph names, then there are at most 2|n distinct bnode generalizations. Just changing the bnodeID does not change a graph into a different graph. And in any case, the situation in datasets is no worse than in RDF graphs, so I think this is a non-issue.

"Therefore, any entailment regime that recognizes datatypes and use this semantics has to be able to ..." Why "that recognizes datatypes"? Any entailment regime that extends this semantics has to 'know about' graphs and their identity conditions. It is *like* typed literals, but its not actually a new datatype. (It could be, of course, and then we would have graph literals.)

3.4 Second sentence: "From the truth of these triples, it is possible to infer knowledge that it is convenient to make part of the named graph." ?? Do you mean to say that graphs must be deductively closed? Surely not, but then what does this mean?

".. one wants to allow different view points to be expressed and reasoned with, without creating a conflict or inconsistency."
I don't like this. Technically, this is different from the time and provenance cases, and it appeals to a different logic. The latter are like having an extra parameter (what you describe as the quad case, later) but the useage where separate graphs are used to insulate against contradictions is different, because no extra parameter is implied.
But maybe this is getting too subtle :-)

"... to interpret the graph name as denoting a graph that represents all that is true in the context of the named graph." Really? ALL that is true? Why would it be all? (And how could you know what that totality of truth in a context was, in any case? If the graph name is a time, how can you write down everything that is true at a time?)

But in any case, that is not what happens with the conditions as stated:
"for each named graph pair ng = (n,G), I(ng) is true if I(n) is an RDF graph and E-entails G" can be trivially satisfied by making I(n) be an E-inconsistent graph, eg {:a :p #x0}. Surely this does not represent "all that is true" (??)

"With this semantics too, graph names can be used in triples:"
They can always be used in triples, whatever the semantics. What is different here is that when they are used in triples, they refer to the graph.

"This is similar to saying that the name is interpreted as the intension of the graph, and the actual RDF graph is its extension."
Suggest delete this. Intension/extension is a philosophical minefield, and in any case if the name is denoting a "container" this is not accurate.

3.5 Its not exactly clear how this differs from the 3.4 case when the 'context' is, for example, times. [ <a b c> true in d ], and [ <a b c d> true ], are pretty interchangeable. But again, this is perhaps too picky.

You might mention that (uniquely) with the quad semantics, a named graph does not have the same meaning as the identical graph without a name.

" It can be noted that a semantics where each named graph defines its own context is "SPARQL-ASK-compatible", while a semantics where the graph name denotes the graph or named graph is not compatible in this sense."
This is correct given the way you have defined the various semantics, but it is rather misleading, IMO, because it would be quite possible to have a semantics which made graph names denote graphs, while still supporting inter-graph entailment and so would be compatible in this sense. (This is the 'missing case' I mentioned earlier.)

"This was not retained eventually, because of the lack of experience, and potentially the lack of utility,..."
Yes to lack of experience, but did anyone argue a lack of utility? Suggest omit this second clause. Basically we ran out of time, is what happened :-)

Pat

(Stylistic edits in another message.)

------------------------------------------------------------
IHMC (850)434 8903 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 mobile (preferred)
phayes@ihmc.us http://www.ihmc.us/users/phayes

Received on Tuesday, 3 December 2013 02:51:57 UTC