Re: Sandro's proposal VS RDF Datasets

On Wed, 2012-05-02 at 15:29 +0200, Antoine Zimmermann wrote:
> PS: Ok, by writing this email thoughts came to me and I believe I better 
> see each party's opinions and goals. Sorry if this re-asserts some 
> things that were made explicit in earlier discussions.
> There are parts mostly directed to Pat, but the end is certainly more 
> interesting to others, especially I think Sandro.

A partial reply - I don't think I'll have time to do more before the
meeting.

> 
> Le 30/04/2012 19:53, Pat Hayes a écrit :
> >
> >
> >>
> >>> Seems to me that this analogy strongly supports Sandro's notion
> >>> of graph names as being, well, names of graphs.
> >>>
> >>> But we can take your view, as I understand it. It is simply a
> >>> rejection of the very idea of datasets having any normative
> >>> semantics or meaning. They are just handy datastructures for
> >>> doing various things with pieces of RDF. Which is fine, and saves
> >>> us a lot of WG effort, but hasnt really advanced the state of the
> >>> art very far, and may not really be living up to our charter.
> >>
> >> My view has always been that we define a normative semantics for
> >> RDF Datasets, and I proposed one more than a year ago. It's fairly
> >> simple: you just apply the RDF semantics to each graph separately
> >> and what you get is an entailed dataset. It's nothing special or
> >> strange
> >
> > Well, it is very strange, by some lights. It is wildly out of line
> > with the intuitions and assumptions underlying the 2004
> > specifications (what I called the 'globalist' perspective on IRI
> > meanings.) And it raises an immediate puzzle, which is WHY an RDF
> > graph should suddenly be allowed to change its meaning when it is
> > embedded inside a dataset and given a name. That seemed extremely
> > puzzling to me, I have to say.
> 
> I don't see where the change of meaning happen. If I have the following 
> RDF graph:
> 
> :c  rdfs:subClassOf  :d .
> :x  rdf:type  :c .
> 
> it entails:
> 
> :x  rdf:type  :d .
> 
> If I put this graph in a dataset:
> 
> :d {
>    :c  rdfs:subClassOf  :d .
>    :x  rdf:type  :c .
> }
> 
> it entails:
> 
> :d {
>    :x  rdf:type  :d .
> }

This statement already shows how differently we are thinking of this.

I don't think putting a graph into a dataset in any way affects the
graph or changes its properties.   If G1 entails G2, it doesn't matter
what else we know or say about G1 or G2 -- G1 always entails G2.

When you write down a dataset, as you did twice there in TriG, you are
making a statement.    When you said:
        
        :d {
            :c  rdfs:subClassOf  :d .
            :x  rdf:type  :c .
        }

you were saying, in my proposed reading: ":d is something which contains
the triples {:c  rdfs:subClassOf :d. :x rdf:type :c.}.

When you said:

        :d {
            :x  rdf:type  :d .
        }
        
you were saying, in my proposed reading: ":d is something which contains
the triple {:x rdf:type :d}".

So, yes, of course the first set of triples entails the second set of
triples, but the statement ":d is something which contains {first bunch
of triples}" does not entail the statement ":d is something which
contains {second bunch of triples}".

     -- Sandro

        


> And all other entailments are preserved. They are simply put "in 
> context", so to speak.
> 
> >> or hard to get accepted: it's already implemented in some triple
> >> stores. Yes, it may be little in advancing the state of the art,
> >> but it gives a good ground to define notions such as imports,
> >> temporal reasoning, trust-based reasoning and various other
> >> things. It's perfectly in line with what we have to do according to
> >> our charter.
> >>
> >
> > I agree it is quite precise and quite simple. However, it
> > conspicuously fails to do what seems to me to be part of our charter
> > here, which is to make the notion of named graph precise and give a
> > semantics for it.
> 
> Tell me what is imprecise and I'll fix it. I claim that it is 
> sufficiently precise to be implemented and tested against test cases, 
> and I even think that it is already implemented in some triple stores.
> What is missing in my proposal, IMO, is to clearly define the semantic 
> extensions that would allow one to constrain the graph "names" to denote 
> the graph, that would allow one to "import/inherit" another "named" 
> graph, and possibly other extensions.
> 
> I know it takes what SPARQL calls a "named graph"
> > and gives a semantics for that, but it does so by refusing to treat
> > the "name" as a name of the "graph". Again, even that is only a
> > terminological matter, which we could treat as being unfortunate but
> > not fatal; but if people also wish to use those graph "names" to
> > refer to the actual graphs, as some people apparently do want to do,
> > and I suspect many peple outside the WG will assume that they can
> > freely do, simply from the fact that they are called "name", then
> > this lack of real naming becoimes a genuine semantic problem. Which
> > is why I like Sandro's suggested interpretation of datasets, which
> > provides for the naming relationship, and suggested introducing your
> > contextual-variation-of-meaning idea by a different mechanism built
> > into RDF. If you or someone else can come up with an alternative way
> > to attach names to graphs, I'd be delighted. So far, nobody has,
> > AFAIK.
> 
> If I undeerstood well Sandro's suggested interpretation, he would prefer 
> that the following TriG file:
> 
> :d {
>    :c  rdfs:subClassOf  :d .
>    :x  rdf:type  :c .
> }
> 
> does *not* entail:
> 
> :d {
>    :x  rdf:type  :d .
> }
> 
> So, a graph in a "named" graph pair does not have the semantics of an 
> RDF graph outside it. If such is indeed what Sandro suggest, then I can 
> use your own argument against it: WHY an RDF graph should suddenly be 
> allowed to change its meaning when it is embedded inside a dataset and 
> given a name. *That* seemed extremely puzzling to me.
> 
> Now, concerning graph "names" denoting the graph itself, I'd propose the 
> following:
> 
> Call the Dataset semantics I proposed the "Simple Dataset semantics" 
> (name chosen to mirror Simple entailment in the RDF spec).
> In Simple entailment, predicates are not required to be instances of 
> rdf:Property. But there is a semantic constraint provided by the RDF 
> semantics which impose it to be.
> Similarly, there can be a semantic constraint in "RDF Dataset semantics" 
> (an extension of Simple Dataset semantics") which says that graph 
> "names" must be interpreted as RDF graphs.
> This can be formalised in different ways depending of what we want to 
> do. For instance, we can impose that the graph IRI denote exactly the 
> graph between the curly brackets. Or that it denote a superset of the 
> graph. Or that the graph IRI denotes the graph only in the default 
> graph, but inside a named graph, it is not required to denote anything 
> in particular. But whatever the choice taken there, these can be simply 
> described as semantic extensions of the Simple Dataset semantics.
> 
> 
> >> The way things are going on in this WG tends to suggest that there
> >> will not be any formal semantics for RDF Datasets as there are too
> >> much disagreement on what it should be. I have the impression that
> >> it is the only viable, but disappointing alternative.
> >
> > I dont think we should give up yet. So far, in my experience, this WG
> > is no more internally fractious than other WGs I have been on. It
> > took the first RDF WG nine months to decide how to write the number
> > three, and the ISO group which made common logic went on for four
> > years without agreeing whether the logic was typed or untyped.
> 
> I'm rather confident that these discussions can lead eventually to 
> consensus, but I am a bit afraid of how much time this will take. There 
> is a strong risk that it will take more time than what was initially 
> allocated to the WG. I don't know what's W3C policy wrt extending the 
> duration of WGs.
> 
> >>>>
> >>>> In my opinion, if one just want to quote a graph and talk about
> >>>> it, one just needs RDF triples.
> >>>
> >>> No, that won't do. At the very least we need reification or some
> >>> kind of graph literal construction.
> >>
> >> Not necessarily. RDF does not define a formal semantics for
> >> information about persons, yet it is perfectly possible to talk
> >> about people with RDF.
> >
> > Sigh. You keep saying this and it keeps missing the point. In the
> > case of graph naming, unlike that of person naming, there are
> > entailments that depend upon the name-graph naming relationship being
> > rigid. For example, you really do want the metadata to apply to the
> > actual graph (or graph container, whatever we decide) being named by
> > the name. I don't think that a 'social consensus' is good enough
> > here. But more to the point, with your dataset convention, there are
> > clear use cases where the graph "name" most assuredly does not denote
> > the graph (since it is being used to denote something else entirely),
> > so no amount of social consensus is going to make that work and still
> > be in conformity to the 2004 RDF specs. (Part of the idea behind the
> > 'contexts' design is to keep the association of IRIs to contexts (or
> > extensions) separate from what they denote, precisly in order to
> > allow this kind of usage.)
> 
> Clearly, if you want to do complex reasoning over graphs and check 
> consistency of metadata etc, you'll need some way to make clear how 
> names are related and so on. But it seems to me that the cost it adds, 
> in terms of expressiveness and constraints, is not worth the benefits 
> and commonly accepted best practices are able to solve a huge part of 
> the use cases.
> RDF has the advantage of being very much unconstrained so that it fits 
> many scenarios easily. But the unconstrainedness is a problem in many 
> cases too, that is why we have all these extensions like RDFS, OWL, 
> SWRL, etc. that add their own constrains to solve complex use cases.
> I think we can do the same for datasets. Have a very unconstrained base 
> and propose a few extensions that match the most common use cases.
> In addition to this, we could provide a mechanism to "announce" which 
> extensions are used (probably what you have in mind with your 
> "extension" proposal).
> 
> >
> >> It just requires a social consensus such as FOAF. The same can
> >> happen for talking about graphs. Of course, if you need to do some
> >> stricter reasoning, you would need something more, like e.g. graph
> >> literals but I haven't yet found a convincing use case that would
> >> require it.
> >>
> >>>>
> >>>> <g>    a  :Graph ; dc:creator<me>   ; :saysInTurtle  ":s  :p
> >>>> :o" .
> >>>
> >>> Is ":s :p :o" a string?
> >>
> >> Yes.
> >>
> >>>
> >>>>
> >>>> You can even have a "partial semantics" by separating the
> >>>> triples:
> >>>>
> >>>> <g>    :saysInTurtle  ":s :p :o", ":a :b :c" .
> >>>>
> >>>> Then it's just a matter of social consensus that :saysInTurtle
> >>>> is used to relate an RDF graph to a Turtle serialisation of
> >>>> that graph. You could also add something to the formal
> >>>> semantics, but on the one hand it would create headachs to all
> >>>> implementers (imposing something to be interpreted as an RDF
> >>>> Graph is much more troublesome than implementing
> >>>> rdf:XMLLiteral, for instance), and on the other hand, I can't
> >>>> think of any concrete real life situation where it's actually
> >>>> useful.
> >>>
> >>> I can. If someone wants to get ambitious with their library and
> >>> use some OWL reasoning (as for example the BBC are doing, for
> >>> one) then you really do want to have some connection with the OWL
> >>> content at the level of model theory, if only to clarify what
> >>> owl:sameAs is supposed to mean.
> >>
> >> This is not a concrete example. Can you show a real life problem
> >> that *requires* that a URI is interpreted as an RDF graph to be
> >> solved conveniently?
> >
> > How about using owl:sameAs on IRIs intended to denote graphs? Or
> > between an IRI and a blank node both intended to denote a graph, as
> > in some of Sandro's examples. Or suppose you have classes of graphs,
> > and want to define an OWL restriction class, for example the class of
> > all graphs containing program information whose associated date of
> > creation is earlier than 01012010. If graph "names" don't really
> > refer, none of this really makes sense.
> 
> But what's the real life problem you're trying to solve here?  What are 
> the data and what useful conclusions you would draw from the fact that 
> the name denotes the graph, which you would not be able to draw 
> otherwise? I'll try to extend your example to see if I can get something.
> 
> Consider the example:
> 
> <joe>  <says>  <g> .
> <g>  owl:sameAs  <h> .
> <g> {
>    <joe>  a  foaf:Person .
> }
> <h> {
>    foaf/person  rdfs:subClassOf  foaf:Agent .
> }
> 
> what can we conclude? It all depends how we interpret the named graphs.
> 
> *Case 1.*
>   If <g> is interpreted exactly as the graph inside the curly brackets, 
> then we have an inconsistency. Can this be considered a useful 
> conclusion in such a scenario? I don't know but I find that enforcing 
> the graph IRI to denote exactly the graph is a much too strong and would 
> not be convenient for many use cases (e.g., facts evolving with time).
> 
> *Case 2.*
>   If <g> is interpreted as a supergraph of what's in the brackets, then 
> we can conclude:
> 
> <joe>  <says>  <g> .
> <g>  owl:sameAs  <h> .
> <g> {
>    <joe>  a  foaf:Person .
>    foaf/person  rdfs:subClassOf  foaf:Agent .
> }
> <h> {
>    <joe>  a  foaf:Person .
>    foaf/person  rdfs:subClassOf  foaf:Agent .
> }
> 
> This already looks much more helpful. This probably fits Sandro's 
> endorsement use case as it looks to me it's his suggested semantics.
> 
> But still I find it unsatisfying when it comes to dealing with Graph 
> having different provenance, from which you would like to conclude 
> things such that:
> 
> *Case 3.*
> In this case, the datasets should be read "from source <g>, I know that 
> Joe is a person, from source <h>, I know that persons are agents, but I 
> also know that source <g> and <h> are actually one source. So I can 
> conclude that, according to source <g> (or <h>), Joe is an agent.
> 
> <joe>  <says>  <g> .
> <g>  owl:sameAs  <h> .
> <g> {
>    <joe>  a  foaf:Person .
>    foaf/person  rdfs:subClassOf  foaf:Agent .
>    <joe>  a  foaf:Agent .
> }
> <h> {
>    <joe>  a  foaf:Person .
>    foaf/person  rdfs:subClassOf  foaf:Agent .
>    <joe>  a  foaf:Agent .
> }
> 
> So in the end, case 3 leads to my proposal.
> Hmmm, looking at this and remembering what Ivan said a couple of times 
> "we have to acknowledge that there is no fit-for-all semantics", maybe 
> we can have two competing semantics, but there should be a way to 
> declare which one is assumed when exchanging a TriG file.
> 
> > [skip]
> 
> 

Received on Wednesday, 2 May 2012 13:56:12 UTC