Re: Sandro's proposal VS RDF Datasets from Pat Hayes on 2012-05-02 (public-rdf-wg@w3.org from May 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 2 May 2012 13:53:07 -0500
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: Sandro Hawke <sandro@w3.org>, RDF WG <public-rdf-wg@w3.org>
Message-Id: <58C3A252-A0CD-48EF-99D8-B181AF5A0627@ihmc.us>
On May 2, 2012, at 9:59 AM, Antoine Zimmermann wrote:

> 
> 
> Le 02/05/2012 15:55, Sandro Hawke a écrit :
>> On Wed, 2012-05-02 at 15:29 +0200, Antoine Zimmermann wrote:
>>> PS: Ok, by writing this email thoughts came to me and I believe I better
>>> see each party's opinions and goals. Sorry if this re-asserts some
>>> things that were made explicit in earlier discussions.
>>> There are parts mostly directed to Pat, but the end is certainly more
>>> interesting to others, especially I think Sandro.
>> 
>> A partial reply - I don't think I'll have time to do more before the
>> meeting.
>> 
>>> 
>>> Le 30/04/2012 19:53, Pat Hayes a écrit :
>>>> 
>>>> 
>>>>> 
>>>>>> Seems to me that this analogy strongly supports Sandro's notion
>>>>>> of graph names as being, well, names of graphs.
>>>>>> 
>>>>>> But we can take your view, as I understand it. It is simply a
>>>>>> rejection of the very idea of datasets having any normative
>>>>>> semantics or meaning. They are just handy datastructures for
>>>>>> doing various things with pieces of RDF. Which is fine, and saves
>>>>>> us a lot of WG effort, but hasnt really advanced the state of the
>>>>>> art very far, and may not really be living up to our charter.
>>>>> 
>>>>> My view has always been that we define a normative semantics for
>>>>> RDF Datasets, and I proposed one more than a year ago. It's fairly
>>>>> simple: you just apply the RDF semantics to each graph separately
>>>>> and what you get is an entailed dataset. It's nothing special or
>>>>> strange
>>>> 
>>>> Well, it is very strange, by some lights. It is wildly out of line
>>>> with the intuitions and assumptions underlying the 2004
>>>> specifications (what I called the 'globalist' perspective on IRI
>>>> meanings.) And it raises an immediate puzzle, which is WHY an RDF
>>>> graph should suddenly be allowed to change its meaning when it is
>>>> embedded inside a dataset and given a name. That seemed extremely
>>>> puzzling to me, I have to say.
>>> 
>>> I don't see where the change of meaning happen. If I have the following
>>> RDF graph:
>>> 
>>> :c  rdfs:subClassOf  :d .
>>> :x  rdf:type  :c .
>>> 
>>> it entails:
>>> 
>>> :x  rdf:type  :d .
>>> 
>>> If I put this graph in a dataset:
>>> 
>>> :d {
>>>    :c  rdfs:subClassOf  :d .
>>>    :x  rdf:type  :c .
>>> }
>>> 
>>> it entails:
>>> 
>>> :d {
>>>    :x  rdf:type  :d .
>>> }
>> 
>> This statement already shows how differently we are thinking of this.
>> 
>> I don't think putting a graph into a dataset in any way affects the
>> graph or changes its properties.   If G1 entails G2, it doesn't matter
>> what else we know or say about G1 or G2 -- G1 always entails G2.
> 
> Fair enough, I was replying to Pat who pretends that it does change the meaning.

No, really, I don't. What (in my proposal) does change the meaning is when a graph cites (imports, accepts, inherits...) itself as its context. But that only requires that the graph has a name (so it can refer to itself) and that naming is done in my example using Sandro's semantics for TriG. But the self-context machinery works inside or outside a dataset, once we have some way to refer to graphs by name in an RDF triple.

> Then I rhetorically reused his own arguments to make the contradiction more evident.
> 
>> When you write down a dataset, as you did twice there in TriG, you are
>> making a statement.
> 
> This is true IF you assume your interpretation of datasets. But quoting Ivan again, you have to acknowledge that there may be other ways of interpreting the dataset.

I think this entire discussion is trying to draw out the consequnces of several (well, basically two) different proposals for how to interpret datasets. My contexts proposal uses Sandro's semantics for datasets but shows how to recapture the same content as your semantics, by adding a little extra RDF. 

> 
> My interpretation is that when one writes:
> 
>    :g {
>         :c  rdfs:subClassOf  :d .
>         :x  rdf:type  :c .
>    }
> 
> it's a statement that the two triples are true "according to :g" (or in "context" :g if you prefer this word, or in "graph labelled :g") and from this follows that "in graph :g" the triple ":x  rdf:type  :d ." is true, that is to say, following my interpretation, that:
> 
>    :g {
>         :x  rdf:type  :d .
>    }
> 
> is true.
> I do not say that my interpretation is better or truer than yours. I just say that it makes sense and can serve many use cases. Additionally, it also corresponds to the way some triple store (that do materialisation of inferrences) are implemented.

What it doesnt do, however, is provide a way to assign a name to a graph (layer/gbox/whatever they are called). This is a real issue, seems to me, because we NEED some way to do this. 

Pat

> 
> 
> --AZ
> 
>>   When you said:
>> 
>>         :d {
>>             :c  rdfs:subClassOf  :d .
>>             :x  rdf:type  :c .
>>         }
>> 
>> you were saying, in my proposed reading: ":d is something which contains
>> the triples {:c  rdfs:subClassOf :d. :x rdf:type :c.}.
>> 
>> When you said:
>> 
>>         :d {
>>             :x  rdf:type  :d .
>>         }
>> 
>> you were saying, in my proposed reading: ":d is something which contains
>> the triple {:x rdf:type :d}".
>> 
>> So, yes, of course the first set of triples entails the second set of
>> triples, but the statement ":d is something which contains {first bunch
>> of triples}" does not entail the statement ":d is something which
>> contains {second bunch of triples}".
>> 
>>      -- Sandro
>> 
>> 
>> 
>> 
>>> And all other entailments are preserved. They are simply put "in
>>> context", so to speak.
>>> 
>>>>> or hard to get accepted: it's already implemented in some triple
>>>>> stores. Yes, it may be little in advancing the state of the art,
>>>>> but it gives a good ground to define notions such as imports,
>>>>> temporal reasoning, trust-based reasoning and various other
>>>>> things. It's perfectly in line with what we have to do according to
>>>>> our charter.
>>>>> 
>>>> 
>>>> I agree it is quite precise and quite simple. However, it
>>>> conspicuously fails to do what seems to me to be part of our charter
>>>> here, which is to make the notion of named graph precise and give a
>>>> semantics for it.
>>> 
>>> Tell me what is imprecise and I'll fix it. I claim that it is
>>> sufficiently precise to be implemented and tested against test cases,
>>> and I even think that it is already implemented in some triple stores.
>>> What is missing in my proposal, IMO, is to clearly define the semantic
>>> extensions that would allow one to constrain the graph "names" to denote
>>> the graph, that would allow one to "import/inherit" another "named"
>>> graph, and possibly other extensions.
>>> 
>>> I know it takes what SPARQL calls a "named graph"
>>>> and gives a semantics for that, but it does so by refusing to treat
>>>> the "name" as a name of the "graph". Again, even that is only a
>>>> terminological matter, which we could treat as being unfortunate but
>>>> not fatal; but if people also wish to use those graph "names" to
>>>> refer to the actual graphs, as some people apparently do want to do,
>>>> and I suspect many peple outside the WG will assume that they can
>>>> freely do, simply from the fact that they are called "name", then
>>>> this lack of real naming becoimes a genuine semantic problem. Which
>>>> is why I like Sandro's suggested interpretation of datasets, which
>>>> provides for the naming relationship, and suggested introducing your
>>>> contextual-variation-of-meaning idea by a different mechanism built
>>>> into RDF. If you or someone else can come up with an alternative way
>>>> to attach names to graphs, I'd be delighted. So far, nobody has,
>>>> AFAIK.
>>> 
>>> If I undeerstood well Sandro's suggested interpretation, he would prefer
>>> that the following TriG file:
>>> 
>>> :d {
>>>    :c  rdfs:subClassOf  :d .
>>>    :x  rdf:type  :c .
>>> }
>>> 
>>> does *not* entail:
>>> 
>>> :d {
>>>    :x  rdf:type  :d .
>>> }
>>> 
>>> So, a graph in a "named" graph pair does not have the semantics of an
>>> RDF graph outside it. If such is indeed what Sandro suggest, then I can
>>> use your own argument against it: WHY an RDF graph should suddenly be
>>> allowed to change its meaning when it is embedded inside a dataset and
>>> given a name. *That* seemed extremely puzzling to me.
>>> 
>>> Now, concerning graph "names" denoting the graph itself, I'd propose the
>>> following:
>>> 
>>> Call the Dataset semantics I proposed the "Simple Dataset semantics"
>>> (name chosen to mirror Simple entailment in the RDF spec).
>>> In Simple entailment, predicates are not required to be instances of
>>> rdf:Property. But there is a semantic constraint provided by the RDF
>>> semantics which impose it to be.
>>> Similarly, there can be a semantic constraint in "RDF Dataset semantics"
>>> (an extension of Simple Dataset semantics") which says that graph
>>> "names" must be interpreted as RDF graphs.
>>> This can be formalised in different ways depending of what we want to
>>> do. For instance, we can impose that the graph IRI denote exactly the
>>> graph between the curly brackets. Or that it denote a superset of the
>>> graph. Or that the graph IRI denotes the graph only in the default
>>> graph, but inside a named graph, it is not required to denote anything
>>> in particular. But whatever the choice taken there, these can be simply
>>> described as semantic extensions of the Simple Dataset semantics.
>>> 
>>> 
>>>>> The way things are going on in this WG tends to suggest that there
>>>>> will not be any formal semantics for RDF Datasets as there are too
>>>>> much disagreement on what it should be. I have the impression that
>>>>> it is the only viable, but disappointing alternative.
>>>> 
>>>> I dont think we should give up yet. So far, in my experience, this WG
>>>> is no more internally fractious than other WGs I have been on. It
>>>> took the first RDF WG nine months to decide how to write the number
>>>> three, and the ISO group which made common logic went on for four
>>>> years without agreeing whether the logic was typed or untyped.
>>> 
>>> I'm rather confident that these discussions can lead eventually to
>>> consensus, but I am a bit afraid of how much time this will take. There
>>> is a strong risk that it will take more time than what was initially
>>> allocated to the WG. I don't know what's W3C policy wrt extending the
>>> duration of WGs.
>>> 
>>>>>>> 
>>>>>>> In my opinion, if one just want to quote a graph and talk about
>>>>>>> it, one just needs RDF triples.
>>>>>> 
>>>>>> No, that won't do. At the very least we need reification or some
>>>>>> kind of graph literal construction.
>>>>> 
>>>>> Not necessarily. RDF does not define a formal semantics for
>>>>> information about persons, yet it is perfectly possible to talk
>>>>> about people with RDF.
>>>> 
>>>> Sigh. You keep saying this and it keeps missing the point. In the
>>>> case of graph naming, unlike that of person naming, there are
>>>> entailments that depend upon the name-graph naming relationship being
>>>> rigid. For example, you really do want the metadata to apply to the
>>>> actual graph (or graph container, whatever we decide) being named by
>>>> the name. I don't think that a 'social consensus' is good enough
>>>> here. But more to the point, with your dataset convention, there are
>>>> clear use cases where the graph "name" most assuredly does not denote
>>>> the graph (since it is being used to denote something else entirely),
>>>> so no amount of social consensus is going to make that work and still
>>>> be in conformity to the 2004 RDF specs. (Part of the idea behind the
>>>> 'contexts' design is to keep the association of IRIs to contexts (or
>>>> extensions) separate from what they denote, precisly in order to
>>>> allow this kind of usage.)
>>> 
>>> Clearly, if you want to do complex reasoning over graphs and check
>>> consistency of metadata etc, you'll need some way to make clear how
>>> names are related and so on. But it seems to me that the cost it adds,
>>> in terms of expressiveness and constraints, is not worth the benefits
>>> and commonly accepted best practices are able to solve a huge part of
>>> the use cases.
>>> RDF has the advantage of being very much unconstrained so that it fits
>>> many scenarios easily. But the unconstrainedness is a problem in many
>>> cases too, that is why we have all these extensions like RDFS, OWL,
>>> SWRL, etc. that add their own constrains to solve complex use cases.
>>> I think we can do the same for datasets. Have a very unconstrained base
>>> and propose a few extensions that match the most common use cases.
>>> In addition to this, we could provide a mechanism to "announce" which
>>> extensions are used (probably what you have in mind with your
>>> "extension" proposal).
>>> 
>>>> 
>>>>> It just requires a social consensus such as FOAF. The same can
>>>>> happen for talking about graphs. Of course, if you need to do some
>>>>> stricter reasoning, you would need something more, like e.g. graph
>>>>> literals but I haven't yet found a convincing use case that would
>>>>> require it.
>>>>> 
>>>>>>> 
>>>>>>> <g>     a  :Graph ; dc:creator<me>    ; :saysInTurtle  ":s  :p
>>>>>>> :o" .
>>>>>> 
>>>>>> Is ":s :p :o" a string?
>>>>> 
>>>>> Yes.
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> You can even have a "partial semantics" by separating the
>>>>>>> triples:
>>>>>>> 
>>>>>>> <g>     :saysInTurtle  ":s :p :o", ":a :b :c" .
>>>>>>> 
>>>>>>> Then it's just a matter of social consensus that :saysInTurtle
>>>>>>> is used to relate an RDF graph to a Turtle serialisation of
>>>>>>> that graph. You could also add something to the formal
>>>>>>> semantics, but on the one hand it would create headachs to all
>>>>>>> implementers (imposing something to be interpreted as an RDF
>>>>>>> Graph is much more troublesome than implementing
>>>>>>> rdf:XMLLiteral, for instance), and on the other hand, I can't
>>>>>>> think of any concrete real life situation where it's actually
>>>>>>> useful.
>>>>>> 
>>>>>> I can. If someone wants to get ambitious with their library and
>>>>>> use some OWL reasoning (as for example the BBC are doing, for
>>>>>> one) then you really do want to have some connection with the OWL
>>>>>> content at the level of model theory, if only to clarify what
>>>>>> owl:sameAs is supposed to mean.
>>>>> 
>>>>> This is not a concrete example. Can you show a real life problem
>>>>> that *requires* that a URI is interpreted as an RDF graph to be
>>>>> solved conveniently?
>>>> 
>>>> How about using owl:sameAs on IRIs intended to denote graphs? Or
>>>> between an IRI and a blank node both intended to denote a graph, as
>>>> in some of Sandro's examples. Or suppose you have classes of graphs,
>>>> and want to define an OWL restriction class, for example the class of
>>>> all graphs containing program information whose associated date of
>>>> creation is earlier than 01012010. If graph "names" don't really
>>>> refer, none of this really makes sense.
>>> 
>>> But what's the real life problem you're trying to solve here?  What are
>>> the data and what useful conclusions you would draw from the fact that
>>> the name denotes the graph, which you would not be able to draw
>>> otherwise? I'll try to extend your example to see if I can get something.
>>> 
>>> Consider the example:
>>> 
>>> <joe>   <says>   <g>  .
>>> <g>   owl:sameAs<h>  .
>>> <g>  {
>>>    <joe>   a  foaf:Person .
>>> }
>>> <h>  {
>>>    foaf/person  rdfs:subClassOf  foaf:Agent .
>>> }
>>> 
>>> what can we conclude? It all depends how we interpret the named graphs.
>>> 
>>> *Case 1.*
>>>   If<g>  is interpreted exactly as the graph inside the curly brackets,
>>> then we have an inconsistency. Can this be considered a useful
>>> conclusion in such a scenario? I don't know but I find that enforcing
>>> the graph IRI to denote exactly the graph is a much too strong and would
>>> not be convenient for many use cases (e.g., facts evolving with time).
>>> 
>>> *Case 2.*
>>>   If<g>  is interpreted as a supergraph of what's in the brackets, then
>>> we can conclude:
>>> 
>>> <joe>   <says>   <g>  .
>>> <g>   owl:sameAs<h>  .
>>> <g>  {
>>>    <joe>   a  foaf:Person .
>>>    foaf/person  rdfs:subClassOf  foaf:Agent .
>>> }
>>> <h>  {
>>>    <joe>   a  foaf:Person .
>>>    foaf/person  rdfs:subClassOf  foaf:Agent .
>>> }
>>> 
>>> This already looks much more helpful. This probably fits Sandro's
>>> endorsement use case as it looks to me it's his suggested semantics.
>>> 
>>> But still I find it unsatisfying when it comes to dealing with Graph
>>> having different provenance, from which you would like to conclude
>>> things such that:
>>> 
>>> *Case 3.*
>>> In this case, the datasets should be read "from source<g>, I know that
>>> Joe is a person, from source<h>, I know that persons are agents, but I
>>> also know that source<g>  and<h>  are actually one source. So I can
>>> conclude that, according to source<g>  (or<h>), Joe is an agent.
>>> 
>>> <joe>   <says>   <g>  .
>>> <g>   owl:sameAs<h>  .
>>> <g>  {
>>>    <joe>   a  foaf:Person .
>>>    foaf/person  rdfs:subClassOf  foaf:Agent .
>>>    <joe>   a  foaf:Agent .
>>> }
>>> <h>  {
>>>    <joe>   a  foaf:Person .
>>>    foaf/person  rdfs:subClassOf  foaf:Agent .
>>>    <joe>   a  foaf:Agent .
>>> }
>>> 
>>> So in the end, case 3 leads to my proposal.
>>> Hmmm, looking at this and remembering what Ivan said a couple of times
>>> "we have to acknowledge that there is no fit-for-all semantics", maybe
>>> we can have two competing semantics, but there should be a way to
>>> declare which one is assumed when exchanging a TriG file.
>>> 
>>>> [skip]
>>> 
>>> 
>> 
>> 
>> 
> 
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 66 03
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 2 May 2012 18:53:46 UTC