Re: Sandro's proposal VS RDF Datasets from Pat Hayes on 2012-04-30 (public-rdf-wg@w3.org from April 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 30 Apr 2012 12:53:49 -0500
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-Id: <39654C72-4C9A-423E-B3B4-2D3762515B15@ihmc.us>
On Apr 29, 2012, at 4:47 AM, Antoine Zimmermann wrote:

> Le 29/04/2012 03:25, Pat Hayes a écrit :
>> 
>> On Apr 27, 2012, at 5:01 AM, Antoine Zimmermann wrote:
>> 
>>> Hi all,
>>> 
>>> 
>>> Now I understand better what Sandro's aiming at. Maybe it was made
>>> clear and explicit in previous emails but I have not followed all
>>> the discussions on the graph designs. I'll try to make explicit
>>> here something that I found unsaid.
>>> 
>>> I'll use the phrase "Sandro's view" to denote what *I* think is
>>> Sandro's view, which may not be exactly *his* true view. Please
>>> forget me if I completetly misunderstood your view, Sandro, and
>>> correct me.
>>> 
>>> In Sandro's view, TriG files are a way for people to assert things
>>> and to include quotes of what other people assert. So a TriG file
>>> is always the expression of the opinion/belief/knowledge of the
>>> author of the file (note that the author may be any kind of agents,
>>> not necessarily a person, let's call it the "implicit author"). So
>>> the questions in Sandro's questionnaire really make sense to me
>>> now: "The default graph is asserted" which means, the implicit
>>> author asserts these things that are said in the default graph. And
>>> it's also clear why it says that the TriG file entails the Turtle
>>> file, as Turtle is another way of asserting things. "Named graphs
>>> are not asserted" means that the implicit author is not saying
>>> those things, just merely quoting them. And of course, if you quote
>>> something, you do not want to entail anything from it as the quote
>>> is the quote. And of course too, if one says "<g>  says {:s :p :o .
>>> :a :b :c}" you can as well say that "<g>  says {:s :p :o}" as any
>>> subpart of the quote is also something quoted.
>>> 
>>> Now, there is a problem here. It is not the way RDF datasets are
>>> supposed to work. It is not the way people in the semantic
>>> community use RDF datasets, not even TriG files, as far as I can
>>> see. TriG documents are not published online. They are used either
>>> to serialise an RDF dataset or as configuration files in various
>>> tools or simply to partition the triples in a convenient way.
>>> 
>>> Let us make a comparison. In Sandro's view, I'd say that a TriG
>>> file corresponds to a single book which could refer to many other
>>> books. It could be a catalogue which cite, reference, quote, and
>>> review other books. Of course, the "named books" inside this book
>>> are not "asserted".
>>> 
>>> But in SPARQL, an RDF Dataset is like a library. It contains many
>>> books that do not necessary reference or quote or cote the other
>>> books. It probably has an index (the "default book"). But it does
>>> not make sense to say that the statements in those books "are not
>>> asserted". All books have their own asserted statements from which
>>> you can draw conclusions. E.g., "Luc Skywalker is carrying a light
>>> saber" is asserted inside the book, and inside this book, one can
>>> entail that "Luc is carrying weight". This does not have an impact
>>> on what is asserted in a book of Physics. The book of physics has
>>> its own truth from which one can make other entailments. This is
>>> what an RDF Dataset is: a library of RDF graphs, each having their
>>> own assertions and each carrying implicitly their own conclusions.
>>> 
>>> In Sandro's view, there is this idea that:
>>> 
>>> <g>  {<some triples }
>>> 
>>> is asserting something about the relationship of<g>  with the
>>> triples. But in RDF Dataset, this is just a way to put the triples
>>> on a shelf, and the shelf happens to have an identifier. When we
>>> put something on a shelf in a library, we do not think that we are
>>> asserting a relationship between the shelf and what's on it!
>> 
>> But if you TELL someone that the book is on the shelf, or answer a
>> question about what book is on the shelf, or tell someone WHICH book
>> is on the shelf, then you do exactly assert such a relationship. You
>> did yourself in this very sentence: you said it was ON it. That is a
>> relationship.
> 
> A French expression says:
> 
> << with "ifs" you can put Paris inside a bottle. >>
> 
> Yes, IF you do this or that, things happen.
> A book in a library is not an assertion made by the library or the library owner. IF you query the library database, yes, you get back an assertion saying "the book is on shelf 521", but you just get this IF you ask. Otherwise, the book is just there, no one is quoting it just by standing on a shelf.

The metaphor is getting out of hand, but my point is that when the graph "name" is used inside some RDF to refer to the graph, then we have done the thing that makes all this happen. And people are doing this: Andy and others have been doing it in our own WG discussions, without any special comment, as though it were obviously correct. 

> 
>> Seems to me that this analogy strongly supports Sandro's notion of
>> graph names as being, well, names of graphs.
>> 
>> But we can take your view, as I understand it. It is simply a
>> rejection of the very idea of datasets having any normative semantics
>> or meaning. They are just handy datastructures for doing various
>> things with pieces of RDF. Which is fine, and saves us a lot of WG
>> effort, but hasnt really advanced the state of the art very far, and
>> may not really be living up to our charter.
> 
> My view has always been that we define a normative semantics for RDF Datasets, and I proposed one more than a year ago. It's fairly simple: you just apply the RDF semantics to each graph separately and what you get is an entailed dataset. It's nothing special or strange

Well, it is very strange, by some lights. It is wildly out of line with the intuitions and assumptions underlying the 2004 specifications (what I called the 'globalist' perspective on IRI meanings.) And it raises an immediate puzzle, which is WHY an RDF graph should suddenly be allowed to change its meaning when it is embedded inside a dataset and given a name. That seemed extremely puzzling to me, I have to say.

> or hard to get accepted: it's already implemented in some triple stores.
> Yes, it may be little in advancing the state of the art, but it gives a good ground to define notions such as imports, temporal reasoning, trust-based reasoning and various other things.
> It's perfectly in line with what we have to do according to our charter.
> 

I agree it is quite precise and quite simple. However, it conspicuously fails to do what seems to me to be part of our charter here, which is to make the notion of named graph precise and give a semantics for it. I know it takes what SPARQL calls a "named graph" and gives a semantics for that, but it does so by refusing to treat the "name" as a name of the "graph". Again, even that is only a terminological matter, which we could treat as being unfortunate but not fatal; but if people also wish to use those graph "names" to refer to the actual graphs, as some people apparently do want to do, and I suspect many peple outside the WG will assume that they can freely do, simply from the fact that they are called "name", then this lack of real naming becoimes a genuine semantic problem. Which is why I like Sandro's suggested interpretation of datasets, which provides for the naming relationship, and suggested introducing your contextual-variation-of-meaning idea by a different mechanism built into RDF. If you or someone else can come up with an alternative way to attach names to graphs, I'd be delighted. So far, nobody has, AFAIK.

> The way things are going on in this WG tends to suggest that there will not be any formal semantics for RDF Datasets as there are too much disagreement on what it should be. I have the impression that it is the only viable, but disappointing alternative.

I dont think we should give up yet. So far, in my experience, this WG is no more internally fractious than other WGs I have been on. It took the first RDF WG nine months to decide how to write the number three, and the ISO group which made common logic went on for four years without agreeing whether the logic was typed or untyped. 

>>> 
>>> In my opinion, if one just want to quote a graph and talk about it,
>>> one just needs RDF triples.
>> 
>> No, that won't do. At the very least we need reification or some kind
>> of graph literal construction.
> 
> Not necessarily. RDF does not define a formal semantics for information about persons, yet it is perfectly possible to talk about people with RDF. 

Sigh. You keep saying this and it keeps missing the point. In the case of graph naming, unlike that of person naming, there are entailments that depend upon the name-graph naming relationship being rigid. For example, you really do want the metadata to apply to the actual graph (or graph container, whatever we decide) being named by the name. I don't think that a 'social consensus' is good enough here. But more to the point, with your dataset convention, there are clear use cases where the graph "name" most assuredly does not denote the graph (since it is being used to denote something else entirely), so no amount of social consensus is going to make that work and still be in conformity to the 2004 RDF specs. (Part of the idea behind the 'contexts' design is to keep the association of IRIs to contexts (or extensions) separate from what they denote, precisly in order to allow this kind of usage.)

> It just requires a social consensus such as FOAF. The same can happen for talking about graphs. Of course, if you need to do some stricter reasoning, you would need something more, like e.g. graph literals but I haven't yet found a convincing use case that would require it.
> 
>>> 
>>> <g>   a  :Graph ; dc:creator<me>  ; :saysInTurtle  ":s  :p  :o" .
>> 
>> Is ":s :p :o" a string?
> 
> Yes.
> 
>> 
>>> 
>>> You can even have a "partial semantics" by separating the triples:
>>> 
>>> <g>   :saysInTurtle  ":s :p :o", ":a :b :c" .
>>> 
>>> Then it's just a matter of social consensus that :saysInTurtle is
>>> used to relate an RDF graph to a Turtle serialisation of that
>>> graph. You could also add something to the formal semantics, but on
>>> the one hand it would create headachs to all implementers (imposing
>>> something to be interpreted as an RDF Graph is much more
>>> troublesome than implementing rdf:XMLLiteral, for instance), and on
>>> the other hand, I can't think of any concrete real life situation
>>> where it's actually useful.
>> 
>> I can. If someone wants to get ambitious with their library and use
>> some OWL reasoning (as for example the BBC are doing, for one) then
>> you really do want to have some connection with the OWL content at
>> the level of model theory, if only to clarify what owl:sameAs is
>> supposed to mean.
> 
> This is not a concrete example. Can you show a real life problem that *requires* that a URI is interpreted as an RDF graph to be solved conveniently?

How about using owl:sameAs on IRIs intended to denote graphs? Or between an IRI and a blank node both intended to denote a graph, as in some of Sandro's examples. Or suppose you have classes of graphs, and want to define an OWL restriction class, for example the class of all graphs containing program information whose associated date of creation is earlier than 01012010. If graph "names" don't really refer, none of this really makes sense. 

Pat


> 
>> 
>> Pat
>> 
>>> 
>>> 
>>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 83 36
>>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>> 
>>> 
>> 
>> ------------------------------------------------------------ IHMC
>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>> (850)202 4416   office Pensacola                            (850)202
>> 4440   fax FL 32502                              (850)291 0667
>> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 83 36
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 30 April 2012 17:54:27 UTC