Re: Sandro's proposal VS RDF Datasets from Sandro Hawke on 2012-04-27 (public-rdf-wg@w3.org from April 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 27 Apr 2012 10:14:45 -0400
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-ID: <1335536085.9663.665.camel@waldron>
Thanks for this message.  On first reading, I think, yes, you're
understanding my view.   On closer reading, I see a few points to
clarify, and I've also responded with some motivations for my view.

On Fri, 2012-04-27 at 12:01 +0200, Antoine Zimmermann wrote:
> Hi all,
> 
> 
> Now I understand better what Sandro's aiming at. Maybe it was made clear 
> and explicit in previous emails but I have not followed all the 
> discussions on the graph designs. I'll try to make explicit here 
> something that I found unsaid.
> 
> I'll use the phrase "Sandro's view" to denote what *I* think is Sandro's 
> view, which may not be exactly *his* true view. Please forget me if I 
> completetly misunderstood your view, Sandro, and correct me.
> 
> In Sandro's view, TriG files are a way for people to assert things and 
> to include quotes of what other people assert. So a TriG file is always 
> the expression of the opinion/belief/knowledge of the author of the file 
> (note that the author may be any kind of agents, not necessarily a 
> person, let's call it the "implicit author"). 

Yes.

I'll go a step farther and say that I think about all computer
communications (not just TriG) this way.   Any time one system sends a
packet or message to another, I think about what the receiver "learned"
from that message.  If the sender and receiver are both following the
specs, and the specs are good, the sender can compose a message from
which the receiver will learn certain, intended things.  (In some cases,
the intent is to get the receiver to *do* something, but I think of that
as a second step, as the receiver acts on the knowledge it received.)
I've seen this kind of thinking formalized at the low end as protocol
state transition diagrams, and at the high end in logic-based
multi-agent systems. 

> So the questions in 
> Sandro's questionnaire really make sense to me now:
> "The default graph is asserted" which means, the implicit author asserts 
> these things that are said in the default graph. And it's also clear why 
> it says that the TriG file entails the Turtle file, as Turtle is another 
> way of asserting things.
> "Named graphs are not asserted" means that the implicit author is not 
> saying those things, just merely quoting them.
> And of course, if you quote something, you do not want to entail 
> anything from it as the quote is the quote. 

Yes, exactly.

> And of course too, if one 
> says "<g> says {:s :p :o . :a :b :c}" you can as well say that "<g> says 
> {:s :p :o}" as any subpart of the quote is also something quoted.

Not necessarily.   There are two reasonable and useful ways to
understand "says", if we think of says in the sense of a book saying
something, not a person saying something.

Option 1 ("partial-graph semantics")

From
   <g> saysAtLeast {:s :p :o . :a :b :c}
you could conclude
   <g> saysAtLeast {:s :p :o}
but you can't conclude 
   <g> doesNotSay {:d :e :f}

Option 2 ("complete-graph semantics")

From
   <g> saysOnly {:s :p :o . :a :b :c}
you could NOT conclude
   <g> saysOnly {:s :p :o}

You can however, conclude
   <g> saysAtLeast {:s :p :o}
and (this is the interesting bit)
   <g> doesNotSay {:d :e :f}

So, for me it's kind of a coin toss whether we read TriG along the lines
of saysAtLeast or saysOnly.   Either one works; there are advantages and
disadvantages to each.   I like the way the second is more expressive,
but these days, I'm going with saysAtLeast, because that seems to be how
an overwhelming majority of the WG thinks.

> Now, there is a problem here. It is not the way RDF datasets are 
> supposed to work. 

Supposed by whom?  I think the SPARQL spec is silent on this.  It
essentially provides syntax [or structure] without semantics.  I think
this is because the WG couldn't agree on the semantics, but I think this
area was getting outside their charter.

> It is not the way people in the semantic community use 
> RDF datasets, not even TriG files, as far as I can see.

Agreed, but I think it's mostly compatible, and I think it will move us
more toward the Semantic Web.

> TriG documents are not published online. 

That might prove my point.  Without standard and useful semantics,
there's not much utility to publishing TriG documents.    People can
(and do, I think, sometimes) exchange them on the Web, but they need an
out-of-band agreement about the semantics, so it doesn't scale in the
way the Web usually does or the Semantic Web is supposed to.

> They are used either to 
> serialise an RDF dataset or as configuration files in various tools or 
> simply to partition the triples in a convenient way.
> 
> Let us make a comparison. In Sandro's view, I'd say that a TriG file 
> corresponds to a single book which could refer to many other books. It 
> could be a catalogue which cite, reference, quote, and review other 
> books. Of course, the "named books" inside this book are not "asserted".
> 
> But in SPARQL, an RDF Dataset is like a library. It contains many books 
> that do not necessary reference or quote or cote the other books. It 
> probably has an index (the "default book"). But it does not make sense 
> to say that the statements in those books "are not asserted". All books 
> have their own asserted statements from which you can draw conclusions. 
> E.g., "Luc Skywalker is carrying a light saber" is asserted inside the 
> book, and inside this book, one can entail that "Luc is carrying 
> weight". This does not have an impact on what is asserted in a book of 
> Physics. The book of physics has its own truth from which one can make 
> other entailments. This is what an RDF Dataset is: a library of RDF 
> graphs, each having their own assertions and each carrying implicitly 
> their own conclusions.

My view doesn't exclude or contradict the library view, but, yes, it
scales down to also include the book which quotes other books.

> In Sandro's view, there is this idea that:
> 
> <g> { <some triples }
> 
> is asserting something about the relationship of <g> with the triples. 
> But in RDF Dataset, this is just a way to put the triples on a shelf, 

on_shelf(some_book, some_shelf) looks like a relationship to me.

> and the shelf happens to have an identifier. When we put something on a 
> shelf in a library, we do not think that we are asserting a relationship 
> between the shelf and what's on it!

It varies.   Sometimes the shelving relationship is noise, an artifact
of how many books we have and how big the shelves are; sometimes it's
important information that is derived from the book itself, like when
all the physics books are on the physics shelf; sometimes it's important
information that is external to the book, like when the unwanted books
are moved to the "For sale, $0.10" shelf.

I'm not saying the on_shelf relationship always has to be important, but
I think it's very useful to allow it to be seen, controlled, and
understood.

> In my opinion, if one just want to quote a graph and talk about it, one 
> just needs RDF triples.
> 
> <g>  a  :Graph ;
>       dc:creator  <me> ;
>       :saysInTurtle  ":s  :p  :o" .
> 
> You can even have a "partial semantics" by separating the triples:
> 
> <g>  :saysInTurtle  ":s :p :o", ":a :b :c" .
> 
> Then it's just a matter of social consensus that :saysInTurtle is used 
> to relate an RDF graph to a Turtle serialisation of that graph. 

That is, in fact, one of the proposed designs.   Well, it's a slight
variation on
http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs-Designs#Graph_Datatypes

It has a few problems though, including:
      * if you're working in RDFa or RDF/XML, it'll be weird to express
        the other graphs in Turtle
      * you'd have to repeat all your @prefix declarations in each of
        the graphs
      * there'd be no way to refer to the same blank node between
        graphs.  (Well, you could use .well-known/genid Skolemization,
        but that would be pretty painful for hand-authoring, and
        slightly changes the meaning.)

> You 
> could also add something to the formal semantics, but on the one hand it 
> would create headachs to all implementers (imposing something to be 
> interpreted as an RDF Graph is much more troublesome than implementing 
> rdf:XMLLiteral, for instance), and on the other hand, I can't think of 
> any concrete real life situation where it's actually useful.

My short answer is:
http://www.w3.org/2011/rdf-wg/wiki/Why_Graphs#Separation_of_Inference


    -- Sandro
Received on Friday, 27 April 2012 14:14:58 UTC