Re: comments on Antoine's draft from Antoine Zimmermann on 2013-12-12 (public-rdf-wg@w3.org from December 2013)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Thu, 12 Dec 2013 22:03:07 +0100
To: Pat Hayes <phayes@ihmc.us>
CC: RDF WG <public-rdf-wg@w3.org>
Message-ID: <52AA248B.9080806@emse.fr>
I have just updated Dataset semantics.


Here are the comments saying what I did in response to the review.


Le 03/12/2013 03:51, Pat Hayes a écrit :
> Basically, this is OK, but I think it can be made better. It needs to
> be run through a spellchecker, and I have a lot of niggling edits
> concerned with grammar and subtleties of expression, but first the
> following questions about content.

For the moment, I'd rather parse my own text carefully while addressing 
your comments, correcting typos and grammar errors. When this is done, 
I'll invite you to list remaining errors or propose variations in 
expression. I spellchecked and grammar checked, correcting lots of 
errors (there were many indeed, if there are still, please indicate them).


> In the list of choices for graph name denotation, the cases "a
> container" and "information resource ..by dereferencing" both seem to
> be special cases of the "resource that is constrained to be in a
> relationship". Why list them separately? (Or maybe just list the last
> one as "some other resource that is constrained..." But did we ever
> consider any other cases than those two, in fact?)

By "resource that is constrained to be in a relationship", I meant that 
there exists a named relationship such as "rdf:hasGraph" that must hold 
between the resource denoted by the graph name and the graph itself. I 
added this for clarification.


> The next list of possible meanings seems to omit the case where the
> default graph is understood to be metadata about the contextual named
> graphs, which was Sandro's main use case (and is important for Jeremy
> and for the PROV uses). This is not the same as saying that is a
> 'global context'.

Ok, I added this case.


> section 2.1 3rd para: " Consequently, defining interpretation and
> entailement for RDF datasets would require at least an extension of
> the RDF semantics." COuld be misunderstood to mean that this requires
> a change to RDF semantics, which is not correct. Might be clearer to
> say, RDF semantics does not itself specify a meaning for <name,
> graph> pairs.

I've reformulated the sentence along these lines.


> 2.2 "In Carrol et al., a named graph is simply defined as a pair
> comprising an IRI and an RDF graph." But this is how you have already
> defined them, so what is being conveyed by the word "simply"?

I don't know what I had in mind at that time, I've removed the "simply".


> 2.3 The ASK-no-variables = entailment trick is clever, but why does
> it not apply just as well to named graphs as to the default graph?
> Presumably ASKing a graph directed to a name is entailment by the
> named graph, no?

The entailment is indeed between the graphs inside the named graphs, but 
the trick is to use this to define entailed between the named graph 
pairs rather than just between the graphs inside.  In any case, this is 
detailed in Section 3.7.


> (IF not, why not? Intuitively speaking, that is.) So
> you should be able to get rather more traction out of this idea than
> you do here.

I'll try to find a better formulation but I currently do not touch this 
section.


> Section 3. " reuse RDF semantics as a black box" What does this
> mean?
>
> "The formalisation below indicates that the truth of an RDF dataset
> can be determined in function of the truth of an RDF graph, no matter
> how the latter is determined. Therefore, instead of defining a
> precise definition of RDF graph interpretations and entailment, we
> use the more abstract notion of entailment regime. " I find it very
> hard to understand the logic of this. Why would the black box lead to
> entailment regimes? And do you mean to imply that entailment regimes
> are less precise than model theory? (And if so, why would you be
> trying to be less precise?)

What I want to say here is that dataset semantics is usually defined 
*with respect to* an entailment regime, but it is not necessary to 
specify the regime explicitly (it is just a parameter, like D in 
D-entailment).
Most of the following definitions specify "E-dataset-semantics" for any 
entailment regime E.  A concrete implementation would have to require a 
specific E, such as simple-dataset-semantics, RDF-dataset-semantics, 
RDFS-dataset-semantics, etc.  But the definition of E-dataset-semantics 
can just consider that E is a black box.

It's not less precise, it is less specific. I have rephrased this 
paragraph, hopefully improving the explanation.


> 3.1 Could mention that the simplest notion of entailment is actually
> required by the RDF 1.1 specs, albeit informally. So all entailments
> need to at least support the "only if" part of this.

I've added this instead just after the item list in section "Formal 
definition", since the first item is reflecting the requirement.


> "an equivalent dataset" // "a logically equivalent dataset". The word
> "equivalent" is used in other ways, so need to be very clear.

Done.


> 3.2 I think this is misleading. We have formally decided that
> datasets are single bnode scopes, so to treat bnodes in two named
> graphs as distinct, ie to merge their graphs rather than take their
> union, is just wrong.

This document precisely avoids to say that such and such choices are 
wrong, which would lead people to think that there are legitimate and 
illegitimate dataset semantics.  We have not gotten to this level of 
requirements for dataset semantics.  It seems to me pretty 
straightforward to say that a dataset is true in an interpretation if 
all the graphs in it are true in that interpretation.  This corresponds 
to applying a merge operation.


  (In fact it was this case, of combining graphs
> in a single dataset that motivated the idea that taking the union was
> more correct than taking the merge.) I suggest replacing this by a
> brief explanation of why it is necessary to combine graphs in a
> dataset by unioning rather than merging, *because* they may share
> bnodes.

I have explained why there can be two choices.


> "The main drawback of this dataset semantics is that all triples in
> the named graphs contribute to a global knowledge that must be
> consistent. " Not obvious why this is a 'drawback'. The semantics
> does not require graphs to be consistent. Better to say, the dataset
> (with this semantics) will be inconsistent when the graphs are
> mutually inconsistent, so it cannot *consistently* hold mutually
> inconsistent information. This effectively treats all the named
> graphs as part of a single RDF graph, which is both a feature and a
> problem (depending on what you want it to do.)

I propose to say:

"This dataset semantics makes all triples in the named graphs contribute 
to a global knowledge, thus making the whole dataset inconsistent 
whenever two graphs are mutually contradictory."

And add at the end of the paragraph:  "In this case, this semantics can 
be seen as problematic."


> 3.3 "It is common to use the graph name as a way to identify the RDF
> graph inside the named graphs, or rather, to identify a particular
> occurence of the graph." It would be good to keep these two cases -
> the graph vs. a token of the graph - a bit more separated and maybe
> talk about the difference and how it matters. This is what motivated
> the original Carroll +al definition of a named graph as a pair, the
> pair <graph, IRI> being a 'mathematical' version of the occurrence or
> token of the graph. I think this distinction is critical in
> understanding the semantic issues of naming graphs, and having it
> shunted off into a side remark is rather misleading. Especially as in
> the formal semantics, you use the same Carroll+al denoting-the-pair
> trick :-)
>
> (You could mention this when you talk about the Carroll+al paper, in
> fact, so that the 'name denotes pair' idea is more motivated.)

I added a few sentence there (Section 2.2).


> Also, "as a way to refer to the RDF graph" // "as a way to
> identify..." , since Semantics draws this distinction carefully.

The text beginning each presentation of a distinct formal semantics is 
meant to be informal and intuitive. I'll try to be rigorous as much as I 
can, but the use of common words, with their ambiguity, may be 
legitimate in such a style of presentation.  I am open to suggestion, 
though.  In this case, even if Semantics makes the distinction, I don't 
see why it should not be "identify" here.


> "Intutively, this semantics can be seen as quoting the RDF graphs
> inside the named graphs." I don't think this is correct. The
> name-denotes-graph constraint is one thing, but the named graphs can
> still be asserted by the dataset, and that would not be like
> quotation at all. Quotation would be where the naming is *all* that
> the dataset asserts. (Later: I see that is how you define the
> semantics, but then there is a case you have left out, which where
> the named graphs are both asserted AND the naming relationship is
> asserted.)

What do you mean by "the named graphs are asserted"?
Do you mean: "all the triples in the named graphs are true in the 
interpretation"?  This case is the "default as union/merge" semantics, 
which is compatible with "the graph names denote the pairs".

[Later: I realise your next email probably cancels this comment]

> I think the "Alice said" text is very confusing, because there is
> nothing in the dataset semantics that refers to speech or asserting.
> :alice {:bob :is :smart} could mean that Alice said it, or believes
> it, or has it written on her forehead, or that Alice was the source
> for this triple, etc.. In fact it could mean almost anything about
> Alice. I think this is better omitted.

Again, this is meant to provide intuition. Of course, nothing formally 
refers to speech or asserting, but it must be understood that this 
semantics consider the content of named graph to be the important 
information, rather than considering the meaning of what's in the graph.

For the moment, I don't touch this part. I will think of a way to make 
it clearer.


> Example 14 with the <code>entails</code> is potentially misleading, I
> think might be better to just stick to conventional metadata.

Yes, I agree. I've changed it to ":hasNextVersion" to keep a triple 
relating the two named graphs.


> " the presence of blank nodes as graph names can be problematic
> because a named graph entails an infinity of other named graphs where
> only the graph name is changed to a different blank node." I
> disagree. If there are n graph names, then there are at most 2|n
> distinct bnode generalizations. Just changing the bnodeID does not
> change a graph into a different graph. And in any case, the situation
> in datasets is no worse than in RDF graphs, so I think this is a
> non-issue.

This may be a non-issue, but here I'm not talking about bnode ID. In 
this dataset semantics, any blank node used as a graph name can be 
replaced by another unused blank node. There is an infinite amount of 
blank nodes from which to choose from. This may also lead to having 
blank nodes used inside named graphs become the same as or different 
from bnodes used as graph names. E.g.,

_:b { _:b  dc:created  "2013-12-10"^^xsd:date }
_:d { ex:a  ex:b  ex:c }

is equivalent to (according to this particular semantics) to:

_:c { _:b  dc:created  "2013-12-10"^^xsd:date }
_:b { ex:a  ex:b  ex:c }

But after all, as you say, this is not relevant as a drawback.


> "Therefore, any entailment regime that recognizes datatypes and use
> this semantics has to be able to ..."  Why "that recognizes
> datatypes"? Any entailment regime that extends this semantics has to
> 'know about' graphs and their identity conditions. It is *like* typed
> literals, but its not actually a new datatype. (It could be, of
> course, and then we would have graph literals.)
>
> 3.4 Second sentence: "From the truth of these triples, it is possible
> to infer knowledge that it is convenient to make part of the named
> graph." ?? Do you mean to say that graphs must be deductively closed?
> Surely not, but then what does this mean?

The formulation was clumsy. I reformulated to:
"From the truth of these triples according to the graph semantics, 
follows the truth of the named graph pair."

>
> ".. one wants to allow different view points to be expressed and
> reasoned with, without creating a conflict or inconsistency." I don't
> like this. Technically, this is different from the time and
> provenance cases, and it appeals to a different logic. The latter are
> like having an extra parameter (what you describe as the quad case,
> later) but the useage where separate graphs are used to insulate
> against contradictions is different, because no extra parameter is
> implied. But maybe this is getting too subtle :-)

There are cases where you want to isolate the content of an RDF document 
and draw conclusion from this content only. The content as well as the 
conclusions are attached to a graph name to keep track where the 
conclusion comes from. You may want to do this independently from the 
graph being consistent or not, and independently from the graph being in 
contradiction with another graph in the store.

I haven't made a change for the moment, since you say it may be too subtle.


> "... to interpret the graph name as denoting a graph that represents
> all that is true in the context of the named graph."  Really? ALL
> that is true? Why would it be all? (And how could you know what that
> totality of truth in a context was, in any case? If the graph name is
> a time, how can you write down everything that is true at a time?)

Yes, correct.

> But in any case, that is not what happens with the conditions as
> stated: "for each named graph pair ng = (n,G), I(ng) is true if I(n)
> is an RDF graph and E-entails G" can be trivially satisfied by making
> I(n) be an E-inconsistent graph, eg {:a :p #x0}. Surely this does not
> represent "all that is true" (??)

Indeed. I've changed this to:
"One way is to interpret the graph name as denoting a graph, and a named 
graph pair is true if this graph entails the graph inside the pair."

> "With this semantics too, graph names can be used in triples:" They
> can always be used in triples, whatever the semantics.  What is
> different here is that when they are used in triples, they refer to
> the graph.

I changed it to: "Graph names used in triples that express metadata do 
not necessarily generate inconsistency"

> "This is similar to saying that the name is interpreted as the
> intension of the graph, and the actual RDF graph is its extension."
> Suggest delete this. Intension/extension is a philosophical
> minefield, and in any case if the name is denoting a "container" this
> is not accurate.

I can delete this statement, it does not add much.


> 3.5 Its not exactly clear how this differs from the 3.4 case when the
> 'context' is, for example, times. [ <a b c> true in d ], and [ <a b c
> d> true ], are pretty interchangeable. But again, this is perhaps too
> picky.

<a b c> true in d may not mean the same as <a b c d> true unless it is 
specified like this. <a b c> true in d is based on the truth conditions 
of graph semantics, while the quad-semantics is defines its own. For 
instance:

what can be concluded from this?

{ <ex:a  rdfs:subClassOf  ex:b  ex:c>
   <ex:c  rdfs:subClassOf  ex:d  ex:e> }

It may mean many things.

I have not changed this at the moment.


> You might mention that (uniquely) with the quad semantics, a named
> graph does not have the same meaning as the identical graph without a
> name.
>
> " It can be noted that a semantics where each named graph defines its
> own context is "SPARQL-ASK-compatible", while a semantics where the
> graph name denotes the graph or named graph is not compatible in this
> sense." This is correct given the way you have defined the various
> semantics, but it is rather misleading, IMO, because it would be
> quite possible to have a semantics which made graph names denote
> graphs, while still supporting inter-graph entailment and so would be
> compatible in this sense. (This is the 'missing case' I mentioned
> earlier.)

I think your next email cancelled this comment.


> "This was not retained eventually, because of the lack of experience,
> and potentially the lack of utility,..." Yes to lack of experience,
> but did anyone argue a lack of utility? Suggest omit this second
> clause. Basically we ran out of time, is what happened :-)

Ok.


>
> Pat
>
> (Stylistic edits in another message.)
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 home 40 South Alcaniz St.            (850)202 4416
> office Pensacola                            (850)202 4440   fax FL
> 32502                              (850)291 0667   mobile
> (preferred) phayes@ihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Thursday, 12 December 2013 21:03:43 UTC