Re: comments on Antoine's draft from Antoine Zimmermann on 2013-12-13 (public-rdf-wg@w3.org from December 2013)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Fri, 13 Dec 2013 10:24:58 +0100
To: Pat Hayes <phayes@ihmc.us>
CC: RDF WG <public-rdf-wg@w3.org>
Message-ID: <52AAD26A.4010302@emse.fr>
Some comments below, deleting pieces that are not relevant.


Le 13/12/2013 06:54, Pat Hayes a écrit :
> A few replies noted in line below.
>
> Pat
>

> [...]

>>> Section 3. " reuse RDF semantics as a black box" What does this
>>> mean?
>>>
>>> "The formalisation below indicates that the truth of an RDF
>>> dataset can be determined in function of the truth of an RDF
>>> graph, no matter how the latter is determined. Therefore, instead
>>> of defining a precise definition of RDF graph interpretations and
>>> entailment, we use the more abstract notion of entailment regime.
>>> " I find it very hard to understand the logic of this. Why would
>>> the black box lead to entailment regimes? And do you mean to
>>> imply that entailment regimes are less precise than model theory?
>>> (And if so, why would you be trying to be less precise?)
>>
>> What I want to say here is that dataset semantics is usually
>> defined *with respect to* an entailment regime, but it is not
>> necessary to specify the regime explicitly (it is just a parameter,
>> like D in D-entailment). Most of the following definitions specify
>> "E-dataset-semantics" for any entailment regime E.  A concrete
>> implementation would have to require a specific E, such as
>> simple-dataset-semantics, RDF-dataset-semantics,
>> RDFS-dataset-semantics, etc.  But the definition of
>> E-dataset-semantics can just consider that E is a black box.
>
> OK, I see. I think the phrase "black box" is misleading or maybe just
> odd, it usually conveys rather more than this. (It suggests the the
> internals of the entailment regimes are somehow invisible.)

I will try to find another wording for it.

 > [...]

>>> 3.2 I think this is misleading. We have formally decided that
>>> datasets are single bnode scopes, so to treat bnodes in two
>>> named graphs as distinct, ie to merge their graphs rather than
>>> take their union, is just wrong.
>>
>> This document precisely avoids to say that such and such choices
>> are wrong, which would lead people to think that there are
>> legitimate and illegitimate dataset semantics.  We have not gotten
>> to this level of requirements for dataset semantics.  It seems to
>> me pretty straightforward to say that a dataset is true in an
>> interpretation if all the graphs in it are true in that
>> interpretation.  This corresponds to applying a merge operation.
>
> It is a merge only if the graphs share no bnodes. But we have
> decided, formally, that bnodes in graphs in a dataset are shared, ie
> their scope is the dataset rather than the local graph. So take the
> example
>
> { } :1 { :a :p _:x } :2 { :b :q _:x }
>
> and the merge
>
> :a :p _:x :b :q _:y
>
> There are interpretations which satisfy the merge but do not make all
> the graphs in the dataset true, so if dataset truth means the truth
> of all the graphs in it, then the merge does not entail the dataset.
> But the union will always be equivalent to the truth of all the
> graphs.

What?!  All interpretations that satisfy the merge obviously satisfy the 
two graphs!  You even wrote the proof yourself in RDF 2004 
(http://www.w3.org/TR/rdf-mt/#mergelemprf).

Should I really show the proof to you here?


 > [...]

>>> Also, "as a way to refer to the RDF graph" // "as a way to
>>> identify..." , since Semantics draws this distinction carefully.
>>
>> The text beginning each presentation of a distinct formal semantics
>> is meant to be informal and intuitive. I'll try to be rigorous as
>> much as I can, but the use of common words, with their ambiguity,
>> may be legitimate in such a style of presentation.  I am open to
>> suggestion, though.  In this case, even if Semantics makes the
>> distinction, I don't see why it should not be "identify" here.
>
> Well, if some other natural word can be found, it would be better to
> avoid a terminology clash with part of the existing normative
> documents.

But isn't "identify" precisely the right term here?


> [...]

>>> " the presence of blank nodes as graph names can be problematic
>>> because a named graph entails an infinity of other named graphs
>>> where only the graph name is changed to a different blank node."
>>> I disagree. If there are n graph names, then there are at most
>>> 2|n distinct bnode generalizations. Just changing the bnodeID
>>> does not change a graph into a different graph. And in any case,
>>> the situation in datasets is no worse than in RDF graphs, so I
>>> think this is a non-issue.
>>
>> This may be a non-issue, but here I'm not talking about bnode ID.
>> In this dataset semantics, any blank node used as a graph name can
>> be replaced by another unused blank node.
>
> But it has to be replaced consistently throughout the dataset, or
> else it is a different dataset. Right?

Yes.

>> There is an infinite amount of blank nodes from which to choose
>> from.
>
> Bnodes are not distinct 'things', they are just 'places' in a graph
> (or in this case, a dataset.) That is why we treat graph-equivalent
> graphs (ie 1:1 substitution of bnodes) as identical. Concepts defines
> a similar equivalence for datasets.
>
>> This may also lead to having blank nodes used inside named graphs
>> become the same as or different from bnodes used as graph names.
>> E.g.,
>>
>> _:b { _:b  dc:created  "2013-12-10"^^xsd:date } _:d { ex:a  ex:b
>> ex:c }
>>
>> is equivalent to (according to this particular semantics) to:
>>
>> _:c { _:b  dc:created  "2013-12-10"^^xsd:date } _:b { ex:a  ex:b
>> ex:c }
>
> Unless this is a typo, I don't follow. How can you replace the bnode
> _:b by _:c when it is used as a label but not when it is used inside
> the graphs?

It's not a typo. What I say is that the two datasets are equivalent 
according to this semantics.

> That should not be permissible in *any* semantics.

 From your reaction to this, it seems that it *is* an issue.

> Did you mean this?
>
> _:c { _:c dc:created "2013-12-10"^^xsd:date } _:b { ex:a  ex:b  ex:c
> }
>
> This is equivalent to your first example.

Strictly speaking, the graphs inside the named graph pairs are not 
equal. With just a tad bit of abuse, we can say that isomorphic graphs 
are equal, in which case yes, it is equivalent to my first example. But 
my second example is also equivalent in this case.


>>
>> But after all, as you say, this is not relevant as a drawback.
>>
>>
>>> "Therefore, any entailment regime that recognizes datatypes and
>>> use this semantics has to be able to ..."  Why "that recognizes
>>> datatypes"? Any entailment regime that extends this semantics has
>>> to 'know about' graphs and their identity conditions. It is
>>> *like* typed literals, but its not actually a new datatype. (It
>>> could be, of course, and then we would have graph literals.)
>>>
>>> 3.4 Second sentence: "From the truth of these triples, it is
>>> possible to infer knowledge that it is convenient to make part of
>>> the named graph." ?? Do you mean to say that graphs must be
>>> deductively closed? Surely not, but then what does this mean?
>>
>> The formulation was clumsy. I reformulated to: "From the truth of
>> these triples according to the graph semantics, follows the truth
>> of the named graph pair."
>
> I think the key point is that it would be valid to add valid
> entailments to any named graph, in this semantics. Whereas *any
> change at all* to a named graph would be invalid according to the
> naming semantics. This is a very sharp and vivid way to distinguish
> them.

With this comment, do you imply that this distinction should be stressed 
more?


>>> ".. one wants to allow different view points to be expressed and
>>> reasoned with, without creating a conflict or inconsistency." I
>>> don't like this. Technically, this is different from the time
>>> and provenance cases, and it appeals to a different logic. The
>>> latter are like having an extra parameter (what you describe as
>>> the quad case, later) but the useage where separate graphs are
>>> used to insulate against contradictions is different, because no
>>> extra parameter is implied. But maybe this is getting too subtle
>>> :-)
>>
>> There are cases where you want to isolate the content of an RDF
>> document and draw conclusion from this content only. The content as
>> well as the conclusions are attached to a graph name to keep track
>> where the conclusion comes from. You may want to do this
>> independently from the graph being consistent or not, and
>> independently from the graph being in contradiction with another
>> graph in the store.
>>
>> I haven't made a change for the moment, since you say it may be too
>> subtle.
>
> It probably is. There are several journal papers waiting to be
> written about this stuff, and that would be the place to go into such
> matters.

And there are several journal and conference papers already written on 
this matter.

> [...]

>>> 3.5 Its not exactly clear how this differs from the 3.4 case when
>>> the 'context' is, for example, times. [ <a b c> true in d ], and
>>> [ <a b c d> true ], are pretty interchangeable. But again, this
>>> is perhaps too picky.
>>
>> <a b c> true in d may not mean the same as <a b c d> true unless it
>> is specified like this.
>
> Yes, of course. What I meant to say is that one can impose isomorphic
> truth conditions on either syntactic pattern. They are functionally
> interchangeable, so to speak. So there is no important distinction in
> kind between triples-plus-contexts and quads; they are simply
> syntactic variants.

Yes, but the quad semantics presented there modifies the structure of 
graph interpretations with ternary relations, while the other semantics 
simply reuse the graph interpretation structure as is. So, in order to 
make the quad semantics isomorphic to the dataset semantics, you need to 
rewrite the semantic conditions using the ternary relations, for each 
semantic extensions (simple-quad semantics, RDF-quad semantics, 
RDFS-quad semantics, etc).


>> <a b c> true in d is based on the truth conditions of graph
>> semantics, while the quad-semantics is defines its own. For
>> instance:
>>
>> what can be concluded from this?
>>
>> { <ex:a  rdfs:subClassOf  ex:b  ex:c> <ex:c  rdfs:subClassOf  ex:d
>> ex:e> }
>>
>> It may mean many things.
>>
>> I have not changed this at the moment.
>>
>>

[...]

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Friday, 13 December 2013 09:25:27 UTC