Re: RDF dataset semantics again from Antoine Zimmermann on 2012-08-22 (public-rdf-wg@w3.org from August 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Wed, 22 Aug 2012 15:10:02 +0200
To: public-rdf-wg@w3.org, Pat Hayes <phayes@ihmc.us>
Message-ID: <5034DA2A.2090207@emse.fr>
What's interesting here is that, by adding a constraint to the notion of 
interpretation of [1], there seems to be less entailments than without 
the constraint.
I imagine it is because the constraint is imposing a relation between 
something semantic and something that is in the syntax (it imposes that 
a URI be interpreted as a component of a dataset).
It could also be because the dataset interpretations of [1] are relying 
on multiple RDF interpretations.

This is weird and I'd be interested to hear Pat on the subject.

This may be a reason why we do not want to have the RDF graphs (which 
syntatic things) themselves in the universe of interpretation (the 
semantic things).

...hmm. I don't know if I like it that much anymore.


AZ

Le 22/08/2012 14:58, Antoine Zimmermann a écrit :
> What I do not like in the arguments is the hypothetical "if". Yes, of
> course, if we can extend a minimal semantics to any other form of
> semantics by mere additional semantic conditions, then yes why not?
>
> But I pretend that you are not going to be able to do this from the
> quote-semantics to the dataset semantics of [1].
>
> Would it be ok if we could define the quote-semantics as a semantic
> extension of the semantics of [1]?
>
> Anyway, there is no need for an hypothetical "if": I just did it:
>
> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/Dataset-semantics
>
> This semantics extension of [1] gives the same entailments as what's in
> the RDF Graph Identification proposal. I you don't trust me, I'll
> provide a formal proof. (Or someone provides a counter example).
>
>
> So, to summarise, the proposal in [1]:
> - is extensible with proper semantic conditions to all kinds of other
> semantics;
> - with little semantic extension, can cover all the use cases of the
> quote-semantics;
> - covers in addition all the use case related to reasoning with multiple
> graphs (temporal, multi-source, etc);
> - is very much in line with the SPARQL model, based on entailment
> regimes at the graph level, just like SPARQL.
>
>
> Then I'd like to know what's wrong with this proposal?
>
>
> --AZ
>
>
> Le 22/08/2012 12:06, Ivan Herman a écrit :
>> Antoine,
>>
>> let me try to understand what you propose, because there are
>> different ways to interpret your mail. Is it:
>>
>> 1. RDF 1.1 should be completely silent on any semantics w.r.t.
>> datasets, or
>>
>> 2. RDF 1.1 should adopt [1] as the semantics w.r.t. datasets instead
>> of the 'quoting' semantics as the kind of 'base-line' semantics
>>
>>
>> As for #2: I do not have any fundamental issue with it, technically.
>> However, the proposal was first announced in March '11
>>
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Mar/0277.html
>>
>> followed by a discussion thread; then it continued in a further
>> discussion in a thread started by
>>
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Apr/0116.html
>>
>> finally, there were some revival in
>>
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Aug/0105.html
>>
>> I am probably missing some other threads, but the fact remains that
>> the WG could never get a consensus around [1]. _I am not interested
>> to know why_, by the way; let us say it is part of a collective
>> failure of the group.
>>
>> *If* the WG can get to a consensus around that semantics as a base
>> line now, I am personally fine with it (I do understand the arguments
>> against the quote semantics). The feeling among ourselves, when we
>> put together the document, was that the quote semantics is pretty
>> much the bare minimum that the WG nay get a consensus on and, if we
>> define some sort of an extension mechanism, others like the one in
>> [1] can also be expressed.
>>
>> Of course, we can go the #1 line. I would prefer not, and find a
>> minimum, but I will not lie down the road if that is what we will end
>> up with...
>>
>> Ivan
>>
>> [1]
>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
>>
>>
>>
>>
>>
>> On Aug 22, 2012, at 10:28 , Antoine Zimmermann wrote:
>>
>>> Sandro, all,
>>>
>>>
>>> Sorry again to write very very long emails. I've put tremendous
>>> amount of thinking in this email, so it's really hard to make it
>>> short and summarise all of it. I'm very sorry to say that I'm
>>> leaning very much towards *not* adopting a formal semantics in the
>>> line of the RDF Graph Identification proposal suggests. I can try a
>>> summary: - what conclusion can we draw from a<name,graph> pair? In
>>> the G.I. proposal, essentially none; - we do not need
>>> quote-semantics if we want a faithful retranscription of an
>>> existing graph (e.g., the crawl use case); - the quote-semantics,
>>> as proposed, does not match the notion of quoting in natural
>>> language; - all of SPARQL is based on applying an entailment regime
>>> to all the graphs in a target datasets, be they named or default; -
>>> SPARQL ASK on basic graph patterns and GRAPH graph patterns matches
>>> very precisely the semantics of dataset that I proposed. Please
>>> read on for detailed explanations on these items.
>>>
>>>
>>> First, let me summarise the things on which we seem to agree:
>>>
>>> 1. considering all the discussions on use cases, existing
>>> implementations, SPARQL specs, etc we agree that imposing that the
>>> graph IRI denotes the graph itself is too strong; 2. we want a
>>> minimal semantics, as little constrained as possible, such that
>>> alternative semantics can be defined (by this group or another) as
>>> extensions of it by adding more constrains. 3. a dataset with no
>>> named graphs "behaves" as if it was a normal RDF graph (in
>>> mathematical terms, we can say that there is an injective morphism
>>> from RDF Graphs to RDF Datasets, which means we can assimilate an
>>> RDF Graph to a corresponding RDF Dataset with no named graphs).
>>>
>>>
>>> Let us imagine we only do that, proposing a minimal semantics that
>>> fulfill the 3 items. Formally, one possible proposal could be the
>>> following:
>>>
>>> A simple-dataset-interpretation (or an
>>> rdf/rdfs/d/owl-dataset-interpretation) wrt vocabulary V is a
>>> simple-interpretation (or an rdf/rdfs/d/owl-interpretation) wrt to
>>> vocabulary V \union {rdf:hasGraph} such that:
>>>
>>> - if a dataset D includes a default graph G, then I(G) = false
>>> implies I(D) = false; - if a dataset D includes a named graph<n,G>,
>>> then G in IR (i.e., in the set of resources of interpretation I), n
>>> is in vocabulary V, and<I(n),G> belongs to IEXT(I(rdf:hasGraph)) -
>>> in any other case, I(D) is false for a dataset D.
>>>
>>>
>>> The problem is, without further restrictions, this leads to a
>>> semantics of "no-semantics" for named graphs. We are not allowed to
>>> draw any conclusion from a<name,graph> pair. We end up
>>> formalising, as a model theoretic semantics, the notion of "no
>>> semantics".
>>>
>>> Let me explain this by reducing the case to the RDF semantics. We
>>> all agree that RDF talks about resources, that literals are a
>>> special case of resources, that URIs denote resources and there
>>> exist relationships between resources. But we are not all agreeing
>>> to make entailments on RDF data because there are times when we
>>> want to faithfully transmit an RDF graph exactly as it was
>>> produced.
>>>
>>> So we formalise the "semantics of no-semantics" of RDF like this: a
>>> no-interpretation is a tuple (IR,IP,LV,IS,IL,IEXT) such that: - IR
>>> is a set of resources, - IP is ..., etc... (see RDF Semantics)
>>>
>>> denotation of graphs: - for an RDF graph G, I(G) is true iff G is
>>> in IR.
>>>
>>> this is a semantics where graphs do not entail anything, except
>>> themselves. All the semantics in RDF Semantics 2004 can be derived
>>> from this by adding more constraints. So we are happy as we have
>>> the core semantics from which everything else derives.
>>>
>>>
>>> BUT this is absurd! You don't need to define a semantics of
>>> no-semantics. If you need to keep the original triples, you simply
>>> do not apply the semantics, or at least not to the data you must
>>> share. If you want to transmit a faithful representation of graph,
>>> just do it! It's legal. It'd done all the time. It does not prevent
>>> anyone, including the one who share a faithful copy of an existing
>>> graph, to draw conclusions from the graph.
>>>
>>> That is what a crawler does: it meets normal RDF graphs in the wild
>>> and faithfully transcribes them into named graphs, even though, as
>>> they are RDF Graphs, they have a normative semantics. The semantics
>>> does not have any effect on graphs. A formal semantics does
>>> *nothing*. It does not put conclusions in people's mouth.
>>>
>>> A semantics tells you what you are *allowed* to conclude. It does
>>> not tell you either what to do with these conclusions, nor what you
>>> are *forced* to conclude. And frankly, I would really like to be
>>> allowed to conclude, even without further information, that<g>
>>> {<s> <p> [] } holds whenever<g> {<s> <p> <o> } holds. I
>>> think, after all, that there's hardly one, if any at all, use case
>>> which requires that it is not allowed to draw this conclusion.
>>>
>>>
>>> Take this other angle: assume we have a Web crawler or application
>>> that fetches RDF documents online. It looks up
>>> http://example.com/stuff.rdf and gets an RDF graph. Distinguish 2
>>> possibilities: 1. It puts the RDF graph into a<name,graph> pair.
>>> It ends up with, for instance:
>>>
>>> ex:stuff.rdf {<s> <p> <o> .}
>>>
>>> Given the quote-semantics, it is not allowed to draw the following
>>> conclusion, unless some extra information comes:
>>>
>>> ex:stuff.rdf {<s> <p> <o> .<p> a rdf:Property .}
>>>
>>> 2. It applies operations on the RDF graph to build the RDF-closure
>>> of the RDF graph, that is, it simply draws conclusion from the
>>> graph. It then injects the closure into a<name,graph> pair and
>>> ends up with:
>>>
>>> ex:stuff.rdf {<s> <p> <o> .<p> a rdf:Property .}
>>>
>>> This is all legal, semantically valid operations. The final named
>>> graph is obtained from the two elements "ex:stuff.rdf" and "{<s>
>>> <p> <o>}" by drawing conclusion in RDF and keeping the IRI to
>>> index it.
>>>
>>> So, the construction would be valid and directly following
>>> logically from the given graph and its IRI, but the<name,graph>
>>> pair would not carry the conclusion nonetheless. What kind of
>>> semantics is that?
>>>
>>>
>>>
>>> Another point is that SPARQL relies on an entailment regime (simple
>>> entailment only for SPARQL 1.0), which it uses on all of the graphs
>>> interrogated in a dataset. There is no special treatments of graphs
>>> inside<name,graph> pairs.
>>>
>>> So:
>>>
>>> ASK WHERE { GRAPH<g> {<s> <p> [] } }
>>>
>>> answers yes iff the dataset:
>>>
>>> <g> {<s> <p> [] }
>>>
>>> is entailed by the target dataset according to the semantics of [1]
>>> (which is (c) in my previous email). However, this answer has no
>>> relationship with the quoting semantics, except if, by chance, the
>>> graph named<g> happens to be exactly the triple "<s> <p> []".
>>>
>>>
>>> [1] Semantics, in TF-Graphs/RDF-Datasets-Proposal.
>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
>>>
>>>
>>>
>>>
>>>
> Le 20/08/2012 19:11, Sandro Hawke a écrit :
>>>> On 08/20/2012 10:02 AM, Antoine Zimmermann wrote:
>>>>
>>>> I believe it's possible to handle the use cases that want (a) and
>>>> (c) by standardizing on (b) and then defining additional RDF
>>>> vocabulary terms (either now or later).
>>>
>>> I don't know how you can go from (b) to (c) or from (b) to (a). I
>>> have not yet seen a fully stabilised version of (b), but the ones
>>> that have been sketched do not make it easy to do so. However,
>>> there is a stable and complette version of (c) and I can tell you
>>> here how you can go from (c) to (a). It suffices to add the
>>> following semantic condition to the proposal of [1]:
>>>
>>> - for all names n1, n2 in the vocabulary V, Con(n1) = Con(n2).
>>>
>>> [1] Semantics, in TF-Graphs/RDF-Datasets-Proposal.
>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
>>>
>>>
>>>
>>>
> And if one wants to quote graphs, maybe they should use double quotes:
>>>
>>> <g> ex:hasGraph "<s> <p> <o>"^^ex:Graph .
>>>
>>> which is valid and consistent RDF. This has exactly the semantics
>>> of "no-semantics" described above.
>>>
>>> BTW, the action of quoting in natural language does not reduce the
>>> possible inferences, it increases them. Compare:
>>>
>>> - Joe said the war is over. - Joe said "the war is over".
>>>
>>> In both cases, I can infer that Joe told that the war has come to
>>> and end. But in the second case, I know in addition that Joe used
>>> the word "over". So, if we really want to simulate quotes, then it
>>> should be a more expressive semantics rather than a weaker. So
>>> maybe we can define (b) in function of (c) rather than the
>>> opposite.
>>>
>>>
>>>> (As an aside: I don't think the priorities have any formal
>>>> weight. The WG has never resolved to accept or reject or
>>>> prioritize any uses as more important than any other.)
>>>
>>> Yep, no formal weight but the priorities are showing which use
>>> cases are more important than others, in the view of people from
>>> this working group. That's enough to take a serious look at the
>>> highest priority.
>>>
>>>
>>>>> Also, the condition ∀i: I(ui) = Gi is problematic. At first, it
>>>>> seems to be natural to say that the graph IRI RDF-denotes the
>>>>> graph. But:
>>>>>
>>>>> http://www.w3.org/2011/rdf-wg/meeting/2011-04-14#resolution_1
>>>>>
>>>>> "RESOLVED: Named Graphs in SPARQL associate IRIs and graphs
>>>>> *but* they do not necessarily "name" graphs in the strict
>>>>> model-theoretic sense. A SPARQL Dataset does not establish
>>>>> graphs as referents of IRIs (relevant to ISSUE-30)".
>>>>>
>>>>> I know this resolution is about SPARQL datasets, and it's not
>>>>> necessarily applying to whatever structure we come up with in
>>>>> RDF, but one of the Priority A use cases is to be able to dump
>>>>> a SPARQL store. With this resolution, there is apparently a
>>>>> clash between the use case requirement and the semantic
>>>>> condition.
>>>>>
>>>>
>>>> I agree. I'm pretty sure ∀i: I(ui) = Gi is wrong. Most of the
>>>> time, in practice, Ui denotes a g-box, not a g-snap. (Or,
>>>> sometimes, it's something else associated with a g-box, like the
>>>> primary subject.) I don't see how SPARQL 1.1 UPDATE with the
>>>> GRAPH keyword makes any sense if Ui denotes Gi.
>>>
>>> The GRAPH keyword has its own semantics defined by SPARQL. It does
>>> not relate to the RDF semantics. The GRAPH keyword is just an
>>> indication that we want to work with the RDF graph inside a
>>> certain<name,graph> pair. It is totally independent of what the
>>> URI denotes in RDF semantics.
>>>
>>>
>>>>>
>>>>> My proposal is to define several recommended semantics and
>>>>> allow the concrete syntax to declare in a document what
>>>>> semantics is assumed when exchanging a dataset.
>>>>>
>>>>> I find this idea appealing because it is in line with the fact
>>>>> that information carried by HTTP is accompanied by a self
>>>>> description of how it should be understood. For instance, we
>>>>> have MIME types, we have <!DOCTYPE> declarations, etc. Since
>>>>> RDF is not a purely syntactical datastructure, it makes sense
>>>>> that it carries with it a reference to the semantics it uses.
>>>>> Such practices of referencing the MIME type, charset, doctype,
>>>>> schema, etc have been a key enabler of interoperability on the
>>>>> Web. Why not extend the pattern to the formal semantics? BTW,
>>>>> SPARQL services have a way to tell what inferrence regime they
>>>>> support, and SPARQL queries have a way to ask for a particular
>>>>> regime. I pretend that my proposal is simply in agreement with
>>>>> already accepted notions in the SPARQL world.
>>>>>
>>>>
>>>> I see the appeal -- solving each kind of problem with an
>>>> approach crafted directly for it -- but my sense is this would
>>>> cause too much confusion in the market and result a lack of
>>>> interoperability. I think we're better off standardizing (b) now,
>>>> as long as I'm right that we can address the (a) and (c) use
>>>> cases using just additional vocabulary.
>>>
>>> I'm pretty sure you cannot get from (b) to (c) with merely
>>> additional vocabulary. Not in the way the semantics of (b) have be
>>> tentatively defined so far. You'd really need extra stuff in the
>>> structure of an interpretation.
>>>
>>>
>>>>
>>>> -- Sandro
>>>>
>>>>>
>>>>> Best,
>>>>
>>>>
>>>>
>>>
>>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
>>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>>
>>
>>
>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF:
>> http://www.ivan-herman.net/foaf.rdf
>>
>>
>>
>>
>>
>>
>>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 22 August 2012 13:10:32 UTC