Re: comments on Antoine's draft from Pat Hayes on 2013-12-13 (public-rdf-wg@w3.org from December 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 12 Dec 2013 21:54:13 -0800
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-Id: <9A3BD338-D2A3-4A44-93BC-8AFE585B0374@ihmc.us>
A few replies noted in line below.

Pat

On Dec 12, 2013, at 1:03 PM, Antoine Zimmermann <antoine.zimmermann@emse.fr> wrote:

> I have just updated Dataset semantics.
> 
> 
> Here are the comments saying what I did in response to the review.
> 
> 
> Le 03/12/2013 03:51, Pat Hayes a écrit :
>> Basically, this is OK, but I think it can be made better. It needs to
>> be run through a spellchecker, and I have a lot of niggling edits
>> concerned with grammar and subtleties of expression, but first the
>> following questions about content.
> 
> For the moment, I'd rather parse my own text carefully while addressing your comments, correcting typos and grammar errors. When this is done, I'll invite you to list remaining errors or propose variations in expression. I spellchecked and grammar checked, correcting lots of errors (there were many indeed, if there are still, please indicate them).

OK. 

>> In the list of choices for graph name denotation, the cases "a
>> container" and "information resource ..by dereferencing" both seem to
>> be special cases of the "resource that is constrained to be in a
>> relationship". Why list them separately? (Or maybe just list the last
>> one as "some other resource that is constrained..." But did we ever
>> consider any other cases than those two, in fact?)
> 
> By "resource that is constrained to be in a relationship", I meant that there exists a named relationship such as "rdf:hasGraph" that must hold between the resource denoted by the graph name and the graph itself. I added this for clarification.
> 
> 
>> The next list of possible meanings seems to omit the case where the
>> default graph is understood to be metadata about the contextual named
>> graphs, which was Sandro's main use case (and is important for Jeremy
>> and for the PROV uses). This is not the same as saying that is a
>> 'global context'.
> 
> Ok, I added this case.
> 
> 
>> section 2.1 3rd para: " Consequently, defining interpretation and
>> entailement for RDF datasets would require at least an extension of
>> the RDF semantics." COuld be misunderstood to mean that this requires
>> a change to RDF semantics, which is not correct. Might be clearer to
>> say, RDF semantics does not itself specify a meaning for <name,
>> graph> pairs.
> 
> I've reformulated the sentence along these lines.
> 
> 
>> 2.2 "In Carrol et al., a named graph is simply defined as a pair
>> comprising an IRI and an RDF graph." But this is how you have already
>> defined them, so what is being conveyed by the word "simply"?
> 
> I don't know what I had in mind at that time

I know that feeling :-)

> , I've removed the "simply".
> 
> 
>> 2.3 The ASK-no-variables = entailment trick is clever, but why does
>> it not apply just as well to named graphs as to the default graph?
>> Presumably ASKing a graph directed to a name is entailment by the
>> named graph, no?
> 
> The entailment is indeed between the graphs inside the named graphs, but the trick is to use this to define entailed between the named graph pairs rather than just between the graphs inside.  In any case, this is detailed in Section 3.7.

Yes, I see that now. Sorry I missed it first time around. 
> 
> 
>> (IF not, why not? Intuitively speaking, that is.) So
>> you should be able to get rather more traction out of this idea than
>> you do here.
> 
> I'll try to find a better formulation but I currently do not touch this section.
> 
> 
>> Section 3. " reuse RDF semantics as a black box" What does this
>> mean?
>> 
>> "The formalisation below indicates that the truth of an RDF dataset
>> can be determined in function of the truth of an RDF graph, no matter
>> how the latter is determined. Therefore, instead of defining a
>> precise definition of RDF graph interpretations and entailment, we
>> use the more abstract notion of entailment regime. " I find it very
>> hard to understand the logic of this. Why would the black box lead to
>> entailment regimes? And do you mean to imply that entailment regimes
>> are less precise than model theory? (And if so, why would you be
>> trying to be less precise?)
> 
> What I want to say here is that dataset semantics is usually defined *with respect to* an entailment regime, but it is not necessary to specify the regime explicitly (it is just a parameter, like D in D-entailment).
> Most of the following definitions specify "E-dataset-semantics" for any entailment regime E.  A concrete implementation would have to require a specific E, such as simple-dataset-semantics, RDF-dataset-semantics, RDFS-dataset-semantics, etc.  But the definition of E-dataset-semantics can just consider that E is a black box.

OK, I see. I think the phrase "black box" is misleading or maybe just odd, it usually conveys rather more than this. (It suggests the the internals of the entailment regimes are somehow invisible.)

> 
> It's not less precise, it is less specific. I have rephrased this paragraph, hopefully improving the explanation.
> 
> 
>> 3.1 Could mention that the simplest notion of entailment is actually
>> required by the RDF 1.1 specs, albeit informally. So all entailments
>> need to at least support the "only if" part of this.
> 
> I've added this instead just after the item list in section "Formal definition", since the first item is reflecting the requirement.
> 
> 
>> "an equivalent dataset" // "a logically equivalent dataset". The word
>> "equivalent" is used in other ways, so need to be very clear.
> 
> Done.
> 
> 
>> 3.2 I think this is misleading. We have formally decided that
>> datasets are single bnode scopes, so to treat bnodes in two named
>> graphs as distinct, ie to merge their graphs rather than take their
>> union, is just wrong.
> 
> This document precisely avoids to say that such and such choices are wrong, which would lead people to think that there are legitimate and illegitimate dataset semantics.  We have not gotten to this level of requirements for dataset semantics.  It seems to me pretty straightforward to say that a dataset is true in an interpretation if all the graphs in it are true in that interpretation.  This corresponds to applying a merge operation.

It is a merge only if the graphs share no bnodes. But we have decided, formally, that bnodes in graphs in a dataset are shared, ie their scope is the dataset rather than the local graph. So take the example

{ }
:1 { :a :p _:x }
:2 { :b :q _:x }

and the merge 

:a :p _:x 
:b :q _:y 

There are interpretations which satisfy the merge but do not make all the graphs in the dataset true, so if dataset truth means the truth of all the graphs in it, then the merge does not entail the dataset. But the union will always be equivalent to the truth of all the graphs. 

> 
> (In fact it was this case, of combining graphs
>> in a single dataset that motivated the idea that taking the union was
>> more correct than taking the merge.) I suggest replacing this by a
>> brief explanation of why it is necessary to combine graphs in a
>> dataset by unioning rather than merging, *because* they may share
>> bnodes.
> 
> I have explained why there can be two choices.
> 
> 
>> "The main drawback of this dataset semantics is that all triples in
>> the named graphs contribute to a global knowledge that must be
>> consistent. " Not obvious why this is a 'drawback'. The semantics
>> does not require graphs to be consistent. Better to say, the dataset
>> (with this semantics) will be inconsistent when the graphs are
>> mutually inconsistent, so it cannot *consistently* hold mutually
>> inconsistent information. This effectively treats all the named
>> graphs as part of a single RDF graph, which is both a feature and a
>> problem (depending on what you want it to do.)
> 
> I propose to say:
> 
> "This dataset semantics makes all triples in the named graphs contribute to a global knowledge, thus making the whole dataset inconsistent whenever two graphs are mutually contradictory."
> 
> And add at the end of the paragraph:  "In this case, this semantics can be seen as problematic."

OK, much clearer. 

> 
> 
>> 3.3 "It is common to use the graph name as a way to identify the RDF
>> graph inside the named graphs, or rather, to identify a particular
>> occurence of the graph." It would be good to keep these two cases -
>> the graph vs. a token of the graph - a bit more separated and maybe
>> talk about the difference and how it matters. This is what motivated
>> the original Carroll +al definition of a named graph as a pair, the
>> pair <graph, IRI> being a 'mathematical' version of the occurrence or
>> token of the graph. I think this distinction is critical in
>> understanding the semantic issues of naming graphs, and having it
>> shunted off into a side remark is rather misleading. Especially as in
>> the formal semantics, you use the same Carroll+al denoting-the-pair
>> trick :-)
>> 
>> (You could mention this when you talk about the Carroll+al paper, in
>> fact, so that the 'name denotes pair' idea is more motivated.)
> 
> I added a few sentence there (Section 2.2).
> 
> 
>> Also, "as a way to refer to the RDF graph" // "as a way to
>> identify..." , since Semantics draws this distinction carefully.
> 
> The text beginning each presentation of a distinct formal semantics is meant to be informal and intuitive. I'll try to be rigorous as much as I can, but the use of common words, with their ambiguity, may be legitimate in such a style of presentation.  I am open to suggestion, though.  In this case, even if Semantics makes the distinction, I don't see why it should not be "identify" here.

Well, if some other natural word can be found, it would be better to avoid a terminology clash with part of the existing normative documents. 
> 
> 
>> "Intutively, this semantics can be seen as quoting the RDF graphs
>> inside the named graphs." I don't think this is correct. The
>> name-denotes-graph constraint is one thing, but the named graphs can
>> still be asserted by the dataset, and that would not be like
>> quotation at all. Quotation would be where the naming is *all* that
>> the dataset asserts. (Later: I see that is how you define the
>> semantics, but then there is a case you have left out, which where
>> the named graphs are both asserted AND the naming relationship is
>> asserted.)
> 
> What do you mean by "the named graphs are asserted"?
> Do you mean: "all the triples in the named graphs are true in the interpretation"?  This case is the "default as union/merge" semantics, which is compatible with "the graph names denote the pairs".
> 
> [Later: I realise your next email probably cancels this comment]

Yes, it does. 

> 
>> I think the "Alice said" text is very confusing, because there is
>> nothing in the dataset semantics that refers to speech or asserting.
>> :alice {:bob :is :smart} could mean that Alice said it, or believes
>> it, or has it written on her forehead, or that Alice was the source
>> for this triple, etc.. In fact it could mean almost anything about
>> Alice. I think this is better omitted.
> 
> Again, this is meant to provide intuition. Of course, nothing formally refers to speech or asserting, but it must be understood that this semantics consider the content of named graph to be the important information, rather than considering the meaning of what's in the graph.
> 
> For the moment, I don't touch this part. I will think of a way to make it clearer.
> 
> 
>> Example 14 with the <code>entails</code> is potentially misleading, I
>> think might be better to just stick to conventional metadata.
> 
> Yes, I agree. I've changed it to ":hasNextVersion" to keep a triple relating the two named graphs.
> 
> 
>> " the presence of blank nodes as graph names can be problematic
>> because a named graph entails an infinity of other named graphs where
>> only the graph name is changed to a different blank node." I
>> disagree. If there are n graph names, then there are at most 2|n
>> distinct bnode generalizations. Just changing the bnodeID does not
>> change a graph into a different graph. And in any case, the situation
>> in datasets is no worse than in RDF graphs, so I think this is a
>> non-issue.
> 
> This may be a non-issue, but here I'm not talking about bnode ID. In this dataset semantics, any blank node used as a graph name can be replaced by another unused blank node.

But it has to be replaced consistently throughout the dataset, or else it is a different dataset. Right?

> There is an infinite amount of blank nodes from which to choose from.

Bnodes are not distinct 'things', they are just 'places' in a graph (or in this case, a dataset.) That is why we treat graph-equivalent graphs (ie 1:1 substitution of bnodes) as identical. Concepts defines a similar equivalence for datasets. 

> This may also lead to having blank nodes used inside named graphs become the same as or different from bnodes used as graph names. E.g.,
> 
> _:b { _:b  dc:created  "2013-12-10"^^xsd:date }
> _:d { ex:a  ex:b  ex:c }
> 
> is equivalent to (according to this particular semantics) to:
> 
> _:c { _:b  dc:created  "2013-12-10"^^xsd:date }
> _:b { ex:a  ex:b  ex:c }

Unless this is a typo, I don't follow. How can you replace the bnode _:b by _:c when it is used as a label but not when it is used inside the graphs? That should not be permissible in *any* semantics. Did you mean this?

_:c { _:c dc:created "2013-12-10"^^xsd:date }
_:b { ex:a  ex:b  ex:c }

This is equivalent to your first example. 


> 
> But after all, as you say, this is not relevant as a drawback.
> 
> 
>> "Therefore, any entailment regime that recognizes datatypes and use
>> this semantics has to be able to ..."  Why "that recognizes
>> datatypes"? Any entailment regime that extends this semantics has to
>> 'know about' graphs and their identity conditions. It is *like* typed
>> literals, but its not actually a new datatype. (It could be, of
>> course, and then we would have graph literals.)
>> 
>> 3.4 Second sentence: "From the truth of these triples, it is possible
>> to infer knowledge that it is convenient to make part of the named
>> graph." ?? Do you mean to say that graphs must be deductively closed?
>> Surely not, but then what does this mean?
> 
> The formulation was clumsy. I reformulated to:
> "From the truth of these triples according to the graph semantics, follows the truth of the named graph pair."

I think the key point is that it would be valid to add valid entailments to any named graph, in this semantics. Whereas *any change at all* to a named graph would be invalid according to the naming semantics. This is a very sharp and vivid way to distinguish them. 

>> ".. one wants to allow different view points to be expressed and
>> reasoned with, without creating a conflict or inconsistency." I don't
>> like this. Technically, this is different from the time and
>> provenance cases, and it appeals to a different logic. The latter are
>> like having an extra parameter (what you describe as the quad case,
>> later) but the useage where separate graphs are used to insulate
>> against contradictions is different, because no extra parameter is
>> implied. But maybe this is getting too subtle :-)
> 
> There are cases where you want to isolate the content of an RDF document and draw conclusion from this content only. The content as well as the conclusions are attached to a graph name to keep track where the conclusion comes from. You may want to do this independently from the graph being consistent or not, and independently from the graph being in contradiction with another graph in the store.
> 
> I haven't made a change for the moment, since you say it may be too subtle.

It probably is. There are several journal papers waiting to be written about this stuff, and that would be the place to go into such matters. 

> 
> 
>> "... to interpret the graph name as denoting a graph that represents
>> all that is true in the context of the named graph."  Really? ALL
>> that is true? Why would it be all? (And how could you know what that
>> totality of truth in a context was, in any case? If the graph name is
>> a time, how can you write down everything that is true at a time?)
> 
> Yes, correct.
> 
>> But in any case, that is not what happens with the conditions as
>> stated: "for each named graph pair ng = (n,G), I(ng) is true if I(n)
>> is an RDF graph and E-entails G" can be trivially satisfied by making
>> I(n) be an E-inconsistent graph, eg {:a :p #x0}. Surely this does not
>> represent "all that is true" (??)
> 
> Indeed. I've changed this to:
> "One way is to interpret the graph name as denoting a graph, and a named graph pair is true if this graph entails the graph inside the pair."

OK

>> "With this semantics too, graph names can be used in triples:" They
>> can always be used in triples, whatever the semantics.  What is
>> different here is that when they are used in triples, they refer to
>> the graph.
> 
> I changed it to: "Graph names used in triples that express metadata do not necessarily generate inconsistency"
> 
>> "This is similar to saying that the name is interpreted as the
>> intension of the graph, and the actual RDF graph is its extension."
>> Suggest delete this. Intension/extension is a philosophical
>> minefield, and in any case if the name is denoting a "container" this
>> is not accurate.
> 
> I can delete this statement, it does not add much.
> 
> 
>> 3.5 Its not exactly clear how this differs from the 3.4 case when the
>> 'context' is, for example, times. [ <a b c> true in d ], and [ <a b c
>> d> true ], are pretty interchangeable. But again, this is perhaps too
>> picky.
> 
> <a b c> true in d may not mean the same as <a b c d> true unless it is specified like this.  

Yes, of course. What I meant to say is that one can impose isomorphic truth conditions on either syntactic pattern. They are functionally interchangeable, so to speak. So there is no important distinction in kind between triples-plus-contexts and quads; they are simply syntactic variants. 

> <a b c> true in d is based on the truth conditions of graph semantics, while the quad-semantics is defines its own. For instance:
> 
> what can be concluded from this?
> 
> { <ex:a  rdfs:subClassOf  ex:b  ex:c>
>  <ex:c  rdfs:subClassOf  ex:d  ex:e> }
> 
> It may mean many things.
> 
> I have not changed this at the moment.
> 
> 
>> You might mention that (uniquely) with the quad semantics, a named
>> graph does not have the same meaning as the identical graph without a
>> name.
>> 
>> " It can be noted that a semantics where each named graph defines its
>> own context is "SPARQL-ASK-compatible", while a semantics where the
>> graph name denotes the graph or named graph is not compatible in this
>> sense." This is correct given the way you have defined the various
>> semantics, but it is rather misleading, IMO, because it would be
>> quite possible to have a semantics which made graph names denote
>> graphs, while still supporting inter-graph entailment and so would be
>> compatible in this sense. (This is the 'missing case' I mentioned
>> earlier.)
> 
> I think your next email cancelled this comment.

Yes, it did. The 'missing case' was a naming semantics that also allowed adding entailments to a graph, which isn't really a sensible combination. 

>> "This was not retained eventually, because of the lack of experience,
>> and potentially the lack of utility,..." Yes to lack of experience,
>> but did anyone argue a lack of utility? Suggest omit this second
>> clause. Basically we ran out of time, is what happened :-)
> 
> Ok.
> 
> 
>> 
>> Pat
>> 
>> (Stylistic edits in another message.)
>> 
>> ------------------------------------------------------------ IHMC
>> (850)434 8903 home 40 South Alcaniz St.            (850)202 4416
>> office Pensacola                            (850)202 4440   fax FL
>> 32502                              (850)291 0667   mobile
>> (preferred) phayes@ihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 66 03
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/

------------------------------------------------------------
IHMC                                     (850)434 8903 home
40 South Alcaniz St.            (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile (preferred)
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 13 December 2013 05:54:48 UTC