Re: Sandro's proposal VS RDF Datasets from Antoine Zimmermann on 2012-05-02 (public-rdf-wg@w3.org from May 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Wed, 02 May 2012 15:29:44 +0200
To: Pat Hayes <phayes@ihmc.us>
CC: RDF WG <public-rdf-wg@w3.org>
Message-ID: <4FA136C8.3050009@emse.fr>
PS: Ok, by writing this email thoughts came to me and I believe I better 
see each party's opinions and goals. Sorry if this re-asserts some 
things that were made explicit in earlier discussions.
There are parts mostly directed to Pat, but the end is certainly more 
interesting to others, especially I think Sandro.


Le 30/04/2012 19:53, Pat Hayes a écrit :
>
>
>>
>>> Seems to me that this analogy strongly supports Sandro's notion
>>> of graph names as being, well, names of graphs.
>>>
>>> But we can take your view, as I understand it. It is simply a
>>> rejection of the very idea of datasets having any normative
>>> semantics or meaning. They are just handy datastructures for
>>> doing various things with pieces of RDF. Which is fine, and saves
>>> us a lot of WG effort, but hasnt really advanced the state of the
>>> art very far, and may not really be living up to our charter.
>>
>> My view has always been that we define a normative semantics for
>> RDF Datasets, and I proposed one more than a year ago. It's fairly
>> simple: you just apply the RDF semantics to each graph separately
>> and what you get is an entailed dataset. It's nothing special or
>> strange
>
> Well, it is very strange, by some lights. It is wildly out of line
> with the intuitions and assumptions underlying the 2004
> specifications (what I called the 'globalist' perspective on IRI
> meanings.) And it raises an immediate puzzle, which is WHY an RDF
> graph should suddenly be allowed to change its meaning when it is
> embedded inside a dataset and given a name. That seemed extremely
> puzzling to me, I have to say.

I don't see where the change of meaning happen. If I have the following 
RDF graph:

:c  rdfs:subClassOf  :d .
:x  rdf:type  :c .

it entails:

:x  rdf:type  :d .

If I put this graph in a dataset:

:d {
   :c  rdfs:subClassOf  :d .
   :x  rdf:type  :c .
}

it entails:

:d {
   :x  rdf:type  :d .
}

And all other entailments are preserved. They are simply put "in 
context", so to speak.

>> or hard to get accepted: it's already implemented in some triple
>> stores. Yes, it may be little in advancing the state of the art,
>> but it gives a good ground to define notions such as imports,
>> temporal reasoning, trust-based reasoning and various other
>> things. It's perfectly in line with what we have to do according to
>> our charter.
>>
>
> I agree it is quite precise and quite simple. However, it
> conspicuously fails to do what seems to me to be part of our charter
> here, which is to make the notion of named graph precise and give a
> semantics for it.

Tell me what is imprecise and I'll fix it. I claim that it is 
sufficiently precise to be implemented and tested against test cases, 
and I even think that it is already implemented in some triple stores.
What is missing in my proposal, IMO, is to clearly define the semantic 
extensions that would allow one to constrain the graph "names" to denote 
the graph, that would allow one to "import/inherit" another "named" 
graph, and possibly other extensions.

I know it takes what SPARQL calls a "named graph"
> and gives a semantics for that, but it does so by refusing to treat
> the "name" as a name of the "graph". Again, even that is only a
> terminological matter, which we could treat as being unfortunate but
> not fatal; but if people also wish to use those graph "names" to
> refer to the actual graphs, as some people apparently do want to do,
> and I suspect many peple outside the WG will assume that they can
> freely do, simply from the fact that they are called "name", then
> this lack of real naming becoimes a genuine semantic problem. Which
> is why I like Sandro's suggested interpretation of datasets, which
> provides for the naming relationship, and suggested introducing your
> contextual-variation-of-meaning idea by a different mechanism built
> into RDF. If you or someone else can come up with an alternative way
> to attach names to graphs, I'd be delighted. So far, nobody has,
> AFAIK.

If I undeerstood well Sandro's suggested interpretation, he would prefer 
that the following TriG file:

:d {
   :c  rdfs:subClassOf  :d .
   :x  rdf:type  :c .
}

does *not* entail:

:d {
   :x  rdf:type  :d .
}

So, a graph in a "named" graph pair does not have the semantics of an 
RDF graph outside it. If such is indeed what Sandro suggest, then I can 
use your own argument against it: WHY an RDF graph should suddenly be 
allowed to change its meaning when it is embedded inside a dataset and 
given a name. *That* seemed extremely puzzling to me.

Now, concerning graph "names" denoting the graph itself, I'd propose the 
following:

Call the Dataset semantics I proposed the "Simple Dataset semantics" 
(name chosen to mirror Simple entailment in the RDF spec).
In Simple entailment, predicates are not required to be instances of 
rdf:Property. But there is a semantic constraint provided by the RDF 
semantics which impose it to be.
Similarly, there can be a semantic constraint in "RDF Dataset semantics" 
(an extension of Simple Dataset semantics") which says that graph 
"names" must be interpreted as RDF graphs.
This can be formalised in different ways depending of what we want to 
do. For instance, we can impose that the graph IRI denote exactly the 
graph between the curly brackets. Or that it denote a superset of the 
graph. Or that the graph IRI denotes the graph only in the default 
graph, but inside a named graph, it is not required to denote anything 
in particular. But whatever the choice taken there, these can be simply 
described as semantic extensions of the Simple Dataset semantics.


>> The way things are going on in this WG tends to suggest that there
>> will not be any formal semantics for RDF Datasets as there are too
>> much disagreement on what it should be. I have the impression that
>> it is the only viable, but disappointing alternative.
>
> I dont think we should give up yet. So far, in my experience, this WG
> is no more internally fractious than other WGs I have been on. It
> took the first RDF WG nine months to decide how to write the number
> three, and the ISO group which made common logic went on for four
> years without agreeing whether the logic was typed or untyped.

I'm rather confident that these discussions can lead eventually to 
consensus, but I am a bit afraid of how much time this will take. There 
is a strong risk that it will take more time than what was initially 
allocated to the WG. I don't know what's W3C policy wrt extending the 
duration of WGs.

>>>>
>>>> In my opinion, if one just want to quote a graph and talk about
>>>> it, one just needs RDF triples.
>>>
>>> No, that won't do. At the very least we need reification or some
>>> kind of graph literal construction.
>>
>> Not necessarily. RDF does not define a formal semantics for
>> information about persons, yet it is perfectly possible to talk
>> about people with RDF.
>
> Sigh. You keep saying this and it keeps missing the point. In the
> case of graph naming, unlike that of person naming, there are
> entailments that depend upon the name-graph naming relationship being
> rigid. For example, you really do want the metadata to apply to the
> actual graph (or graph container, whatever we decide) being named by
> the name. I don't think that a 'social consensus' is good enough
> here. But more to the point, with your dataset convention, there are
> clear use cases where the graph "name" most assuredly does not denote
> the graph (since it is being used to denote something else entirely),
> so no amount of social consensus is going to make that work and still
> be in conformity to the 2004 RDF specs. (Part of the idea behind the
> 'contexts' design is to keep the association of IRIs to contexts (or
> extensions) separate from what they denote, precisly in order to
> allow this kind of usage.)

Clearly, if you want to do complex reasoning over graphs and check 
consistency of metadata etc, you'll need some way to make clear how 
names are related and so on. But it seems to me that the cost it adds, 
in terms of expressiveness and constraints, is not worth the benefits 
and commonly accepted best practices are able to solve a huge part of 
the use cases.
RDF has the advantage of being very much unconstrained so that it fits 
many scenarios easily. But the unconstrainedness is a problem in many 
cases too, that is why we have all these extensions like RDFS, OWL, 
SWRL, etc. that add their own constrains to solve complex use cases.
I think we can do the same for datasets. Have a very unconstrained base 
and propose a few extensions that match the most common use cases.
In addition to this, we could provide a mechanism to "announce" which 
extensions are used (probably what you have in mind with your 
"extension" proposal).

>
>> It just requires a social consensus such as FOAF. The same can
>> happen for talking about graphs. Of course, if you need to do some
>> stricter reasoning, you would need something more, like e.g. graph
>> literals but I haven't yet found a convincing use case that would
>> require it.
>>
>>>>
>>>> <g>    a  :Graph ; dc:creator<me>   ; :saysInTurtle  ":s  :p
>>>> :o" .
>>>
>>> Is ":s :p :o" a string?
>>
>> Yes.
>>
>>>
>>>>
>>>> You can even have a "partial semantics" by separating the
>>>> triples:
>>>>
>>>> <g>    :saysInTurtle  ":s :p :o", ":a :b :c" .
>>>>
>>>> Then it's just a matter of social consensus that :saysInTurtle
>>>> is used to relate an RDF graph to a Turtle serialisation of
>>>> that graph. You could also add something to the formal
>>>> semantics, but on the one hand it would create headachs to all
>>>> implementers (imposing something to be interpreted as an RDF
>>>> Graph is much more troublesome than implementing
>>>> rdf:XMLLiteral, for instance), and on the other hand, I can't
>>>> think of any concrete real life situation where it's actually
>>>> useful.
>>>
>>> I can. If someone wants to get ambitious with their library and
>>> use some OWL reasoning (as for example the BBC are doing, for
>>> one) then you really do want to have some connection with the OWL
>>> content at the level of model theory, if only to clarify what
>>> owl:sameAs is supposed to mean.
>>
>> This is not a concrete example. Can you show a real life problem
>> that *requires* that a URI is interpreted as an RDF graph to be
>> solved conveniently?
>
> How about using owl:sameAs on IRIs intended to denote graphs? Or
> between an IRI and a blank node both intended to denote a graph, as
> in some of Sandro's examples. Or suppose you have classes of graphs,
> and want to define an OWL restriction class, for example the class of
> all graphs containing program information whose associated date of
> creation is earlier than 01012010. If graph "names" don't really
> refer, none of this really makes sense.

But what's the real life problem you're trying to solve here?  What are 
the data and what useful conclusions you would draw from the fact that 
the name denotes the graph, which you would not be able to draw 
otherwise? I'll try to extend your example to see if I can get something.

Consider the example:

<joe>  <says>  <g> .
<g>  owl:sameAs  <h> .
<g> {
   <joe>  a  foaf:Person .
}
<h> {
   foaf/person  rdfs:subClassOf  foaf:Agent .
}

what can we conclude? It all depends how we interpret the named graphs.

*Case 1.*
  If <g> is interpreted exactly as the graph inside the curly brackets, 
then we have an inconsistency. Can this be considered a useful 
conclusion in such a scenario? I don't know but I find that enforcing 
the graph IRI to denote exactly the graph is a much too strong and would 
not be convenient for many use cases (e.g., facts evolving with time).

*Case 2.*
  If <g> is interpreted as a supergraph of what's in the brackets, then 
we can conclude:

<joe>  <says>  <g> .
<g>  owl:sameAs  <h> .
<g> {
   <joe>  a  foaf:Person .
   foaf/person  rdfs:subClassOf  foaf:Agent .
}
<h> {
   <joe>  a  foaf:Person .
   foaf/person  rdfs:subClassOf  foaf:Agent .
}

This already looks much more helpful. This probably fits Sandro's 
endorsement use case as it looks to me it's his suggested semantics.

But still I find it unsatisfying when it comes to dealing with Graph 
having different provenance, from which you would like to conclude 
things such that:

*Case 3.*
In this case, the datasets should be read "from source <g>, I know that 
Joe is a person, from source <h>, I know that persons are agents, but I 
also know that source <g> and <h> are actually one source. So I can 
conclude that, according to source <g> (or <h>), Joe is an agent.

<joe>  <says>  <g> .
<g>  owl:sameAs  <h> .
<g> {
   <joe>  a  foaf:Person .
   foaf/person  rdfs:subClassOf  foaf:Agent .
   <joe>  a  foaf:Agent .
}
<h> {
   <joe>  a  foaf:Person .
   foaf/person  rdfs:subClassOf  foaf:Agent .
   <joe>  a  foaf:Agent .
}

So in the end, case 3 leads to my proposal.
Hmmm, looking at this and remembering what Ivan said a couple of times 
"we have to acknowledge that there is no fit-for-all semantics", maybe 
we can have two competing semantics, but there should be a way to 
declare which one is assumed when exchanging a TriG file.

> [skip]


-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 83 36
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 2 May 2012 13:31:06 UTC