Re: RDF dataset semantics again from Antoine Zimmermann on 2012-08-22 (public-rdf-wg@w3.org from August 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Wed, 22 Aug 2012 14:58:50 +0200
To: public-rdf-wg@w3.org
Message-ID: <5034D78A.7030208@emse.fr>
What I do not like in the arguments is the hypothetical "if". Yes, of 
course, if we can extend a minimal semantics to any other form of 
semantics by mere additional semantic conditions, then yes why not?

But I pretend that you are not going to be able to do this from the 
quote-semantics to the dataset semantics of [1].

Would it be ok if we could define the quote-semantics as a semantic 
extension of the semantics of [1]?

Anyway, there is no need for an hypothetical "if": I just did it:

http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/Dataset-semantics

This semantics extension of [1] gives the same entailments as what's in 
the RDF Graph Identification proposal. I you don't trust me, I'll 
provide a formal proof. (Or someone provides a counter example).


So, to summarise, the proposal in [1]:
  - is extensible with proper semantic conditions to all kinds of other 
semantics;
  - with little semantic extension, can cover all the use cases of the 
quote-semantics;
  - covers in addition all the use case related to reasoning with 
multiple graphs (temporal, multi-source, etc);
  - is very much in line with the SPARQL model, based on entailment 
regimes at the graph level, just like SPARQL.


Then I'd like to know what's wrong with this proposal?


--AZ


Le 22/08/2012 12:06, Ivan Herman a écrit :
> Antoine,
>
> let me try to understand what you propose, because there are
> different ways to interpret your mail. Is it:
>
> 1. RDF 1.1 should be completely silent on any semantics w.r.t.
> datasets, or
>
> 2. RDF 1.1 should adopt [1] as the semantics w.r.t. datasets instead
> of the 'quoting' semantics as the kind of 'base-line' semantics
>
>
> As for #2: I do not have any fundamental issue with it, technically.
> However, the proposal was first announced in March '11
>
> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Mar/0277.html
>
> followed by a discussion thread; then it continued in a further
> discussion in a thread started by
>
> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Apr/0116.html
>
> finally, there were some revival in
>
> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Aug/0105.html
>
> I am probably missing some other threads, but the fact remains that
> the WG could never get a consensus around [1]. _I am not interested
> to know why_, by the way; let us say it is part of a collective
> failure of the group.
>
> *If* the WG can get to a consensus around that semantics as a base
> line now, I am personally fine with it (I do understand the arguments
> against the quote semantics). The feeling among ourselves, when we
> put together the document, was that the quote semantics is pretty
> much the bare minimum that the WG nay get a consensus on and, if we
> define some sort of an extension mechanism, others like the one in
> [1] can also be expressed.
>
> Of course, we can go the #1 line. I would prefer not, and find a
> minimum, but I will not lie down the road if that is what we will end
> up with...
>
> Ivan
>
> [1]
> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
>
>
>
>
> On Aug 22, 2012, at 10:28 , Antoine Zimmermann wrote:
>
>> Sandro, all,
>>
>>
>> Sorry again to write very very long emails. I've put tremendous
>> amount of thinking in this email, so it's really hard to make it
>> short and summarise all of it. I'm very sorry to say that I'm
>> leaning very much towards *not* adopting a formal semantics in the
>> line of the RDF Graph Identification proposal suggests. I can try a
>> summary: - what conclusion can we draw from a<name,graph>  pair? In
>> the G.I. proposal, essentially none; - we do not need
>> quote-semantics if we want a faithful retranscription of an
>> existing graph (e.g., the crawl use case); - the quote-semantics,
>> as proposed, does not match the notion of quoting in natural
>> language; - all of SPARQL is based on applying an entailment regime
>> to all the graphs in a target datasets, be they named or default; -
>> SPARQL ASK on basic graph patterns and GRAPH graph patterns matches
>> very precisely the semantics of dataset that I proposed. Please
>> read on for detailed explanations on these items.
>>
>>
>> First, let me summarise the things on which we seem to agree:
>>
>> 1. considering all the discussions on use cases, existing
>> implementations, SPARQL specs, etc we agree that imposing that the
>> graph IRI denotes the graph itself is too strong; 2. we want a
>> minimal semantics, as little constrained as possible, such that
>> alternative semantics can be defined (by this group or another) as
>> extensions of it by adding more constrains. 3. a dataset with no
>> named graphs "behaves" as if it was a normal RDF graph (in
>> mathematical terms, we can say that there is an injective morphism
>> from RDF Graphs to RDF Datasets, which means we can assimilate an
>> RDF Graph to a corresponding RDF Dataset with no named graphs).
>>
>>
>> Let us imagine we only do that, proposing a minimal semantics that
>> fulfill the 3 items. Formally, one possible proposal could be the
>> following:
>>
>> A simple-dataset-interpretation (or an
>> rdf/rdfs/d/owl-dataset-interpretation) wrt vocabulary V is a
>> simple-interpretation (or an rdf/rdfs/d/owl-interpretation) wrt to
>> vocabulary V \union {rdf:hasGraph} such that:
>>
>> - if a dataset D includes a default graph G, then I(G) = false
>> implies I(D) = false; - if a dataset D includes a named graph<n,G>,
>> then G in IR (i.e., in the set of resources of interpretation I), n
>> is in vocabulary V, and<I(n),G>  belongs to IEXT(I(rdf:hasGraph)) -
>> in any other case, I(D) is false for a dataset D.
>>
>>
>> The problem is, without further restrictions, this leads to a
>> semantics of "no-semantics" for named graphs. We are not allowed to
>> draw any conclusion from a<name,graph>  pair. We end up
>> formalising, as a model theoretic semantics, the notion of "no
>> semantics".
>>
>> Let me explain this by reducing the case to the RDF semantics. We
>> all agree that RDF talks about resources, that literals are a
>> special case of resources, that URIs denote resources and there
>> exist relationships between resources. But we are not all agreeing
>> to make entailments on RDF data because there are times when we
>> want to faithfully transmit an RDF graph exactly as it was
>> produced.
>>
>> So we formalise the "semantics of no-semantics" of RDF like this: a
>> no-interpretation is a tuple (IR,IP,LV,IS,IL,IEXT) such that: - IR
>> is a set of resources, - IP is ..., etc... (see RDF Semantics)
>>
>> denotation of graphs: - for an RDF graph G, I(G) is true iff G is
>> in IR.
>>
>> this is a semantics where graphs do not entail anything, except
>> themselves. All the semantics in RDF Semantics 2004 can be derived
>> from this by adding more constraints. So we are happy as we have
>> the core semantics from which everything else derives.
>>
>>
>> BUT this is absurd!  You don't need to define a semantics of
>> no-semantics. If you need to keep the original triples, you simply
>> do not apply the semantics, or at least not to the data you must
>> share. If you want to transmit a faithful representation of graph,
>> just do it! It's legal. It'd done all the time. It does not prevent
>> anyone, including the one who share a faithful copy of an existing
>> graph, to draw conclusions from the graph.
>>
>> That is what a crawler does: it meets normal RDF graphs in the wild
>> and faithfully transcribes them into named graphs, even though, as
>> they are RDF Graphs, they have a normative semantics. The semantics
>> does not have any effect on graphs. A formal semantics does
>> *nothing*. It does not put conclusions in people's mouth.
>>
>> A semantics tells you what you are *allowed* to conclude. It does
>> not tell you either what to do with these conclusions, nor what you
>> are *forced* to conclude. And frankly, I would really like to be
>> allowed to conclude, even without further information, that<g>
>> {<s>  <p>  [] } holds whenever<g>  {<s>  <p>  <o>  } holds. I
>> think, after all, that there's hardly one, if any at all, use case
>> which requires that it is not allowed to draw this conclusion.
>>
>>
>> Take this other angle: assume we have a Web crawler or application
>> that fetches RDF documents online. It looks up
>> http://example.com/stuff.rdf and gets an RDF graph. Distinguish 2
>> possibilities: 1.  It puts the RDF graph into a<name,graph>  pair.
>> It ends up with, for instance:
>>
>> ex:stuff.rdf {<s>  <p>  <o>  .}
>>
>> Given the quote-semantics, it is not allowed to draw the following
>> conclusion, unless some extra information comes:
>>
>> ex:stuff.rdf {<s>  <p>  <o>  .<p>  a rdf:Property .}
>>
>> 2.  It applies operations on the RDF graph to build the RDF-closure
>> of the RDF graph, that is, it simply draws conclusion from the
>> graph. It then injects the closure into a<name,graph>  pair and
>> ends up with:
>>
>> ex:stuff.rdf {<s>  <p>  <o>  .<p>  a rdf:Property .}
>>
>> This is all legal, semantically valid operations. The final named
>> graph is obtained from the two elements "ex:stuff.rdf" and "{<s>
>> <p>  <o>}" by drawing conclusion in RDF and keeping the IRI to
>> index it.
>>
>> So, the construction would be valid and directly following
>> logically from the given graph and its IRI, but the<name,graph>
>> pair would not carry the conclusion nonetheless. What kind of
>> semantics is that?
>>
>>
>>
>> Another point is that SPARQL relies on an entailment regime (simple
>> entailment only for SPARQL 1.0), which it uses on all of the graphs
>> interrogated in a dataset. There is no special treatments of graphs
>> inside<name,graph>  pairs.
>>
>> So:
>>
>> ASK WHERE { GRAPH<g>  {<s>  <p>  [] } }
>>
>> answers yes iff the dataset:
>>
>> <g>  {<s>  <p>  [] }
>>
>> is entailed by the target dataset according to the semantics of [1]
>> (which is (c) in my previous email). However, this answer has no
>> relationship with the quoting semantics, except if, by chance, the
>> graph named<g>  happens to be exactly the triple "<s>  <p>  []".
>>
>>
>> [1]  Semantics, in TF-Graphs/RDF-Datasets-Proposal.
>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
>>
>>
>>
>>
Le 20/08/2012 19:11, Sandro Hawke a écrit :
>>> On 08/20/2012 10:02 AM, Antoine Zimmermann wrote:
>>>
>>> I believe it's possible to handle the use cases that want (a) and
>>> (c) by standardizing on (b) and then defining additional RDF
>>> vocabulary terms (either now or later).
>>
>> I don't know how you can go from (b) to (c) or from (b) to (a). I
>> have not yet seen a fully stabilised version of (b), but the ones
>> that have been sketched do not make it easy to do so. However,
>> there is a stable and complette version of (c) and I can tell you
>> here how you can go from (c) to (a). It suffices to add the
>> following semantic condition to the proposal of [1]:
>>
>> - for all names n1, n2 in the vocabulary V, Con(n1) = Con(n2).
>>
>> [1]  Semantics, in TF-Graphs/RDF-Datasets-Proposal.
>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
>>
>>
>>
And if one wants to quote graphs, maybe they should use double quotes:
>>
>> <g>   ex:hasGraph  "<s>  <p>  <o>"^^ex:Graph .
>>
>> which is valid and consistent RDF. This has exactly the semantics
>> of "no-semantics" described above.
>>
>> BTW, the action of quoting in natural language does not reduce the
>> possible inferences, it increases them. Compare:
>>
>> - Joe said the war is over. - Joe said "the war is over".
>>
>> In both cases, I can infer that Joe told that the war has come to
>> and end. But in the second case, I know in addition that Joe used
>> the word "over". So, if we really want to simulate quotes, then it
>> should be a more expressive semantics rather than a weaker. So
>> maybe we can define (b) in function of (c) rather than the
>> opposite.
>>
>>
>>> (As an aside: I don't think the priorities have any formal
>>> weight. The WG has never resolved to accept or reject or
>>> prioritize any uses as more important than any other.)
>>
>> Yep, no formal weight but the priorities are showing which use
>> cases are more important than others, in the view of people from
>> this working group. That's enough to take a serious look at the
>> highest priority.
>>
>>
>>>> Also, the condition ∀i: I(ui) = Gi is problematic. At first, it
>>>> seems to be natural to say that the graph IRI RDF-denotes the
>>>> graph. But:
>>>>
>>>> http://www.w3.org/2011/rdf-wg/meeting/2011-04-14#resolution_1
>>>>
>>>> "RESOLVED: Named Graphs in SPARQL associate IRIs and graphs
>>>> *but* they do not necessarily "name" graphs in the strict
>>>> model-theoretic sense. A SPARQL Dataset does not establish
>>>> graphs as referents of IRIs (relevant to ISSUE-30)".
>>>>
>>>> I know this resolution is about SPARQL datasets, and it's not
>>>> necessarily applying to whatever structure we come up with in
>>>> RDF, but one of the Priority A use cases is to be able to dump
>>>> a SPARQL store. With this resolution, there is apparently a
>>>> clash between the use case requirement and the semantic
>>>> condition.
>>>>
>>>
>>> I agree. I'm pretty sure ∀i: I(ui) = Gi is wrong. Most of the
>>> time, in practice, Ui denotes a g-box, not a g-snap. (Or,
>>> sometimes, it's something else associated with a g-box, like the
>>> primary subject.) I don't see how SPARQL 1.1 UPDATE with the
>>> GRAPH keyword makes any sense if Ui denotes Gi.
>>
>> The GRAPH keyword has its own semantics defined by SPARQL. It does
>> not relate to the RDF semantics. The GRAPH keyword is just an
>> indication that we want to work with the RDF graph inside a
>> certain<name,graph>  pair. It is totally independent of what the
>> URI denotes in RDF semantics.
>>
>>
>>>>
>>>> My proposal is to define several recommended semantics and
>>>> allow the concrete syntax to declare in a document what
>>>> semantics is assumed when exchanging a dataset.
>>>>
>>>> I find this idea appealing because it is in line with the fact
>>>> that information carried by HTTP is accompanied by a self
>>>> description of how it should be understood. For instance, we
>>>> have MIME types, we have <!DOCTYPE>  declarations, etc. Since
>>>> RDF is not a purely syntactical datastructure, it makes sense
>>>> that it carries with it a reference to the semantics it uses.
>>>> Such practices of referencing the MIME type, charset, doctype,
>>>> schema, etc have been a key enabler of interoperability on the
>>>> Web. Why not extend the pattern to the formal semantics? BTW,
>>>> SPARQL services have a way to tell what inferrence regime they
>>>> support, and SPARQL queries have a way to ask for a particular
>>>> regime. I pretend that my proposal is simply in agreement with
>>>> already accepted notions in the SPARQL world.
>>>>
>>>
>>> I see the appeal -- solving each kind of problem with an
>>> approach crafted directly for it -- but my sense is this would
>>> cause too much confusion in the market and result a lack of
>>> interoperability. I think we're better off standardizing (b) now,
>>> as long as I'm right that we can address the (a) and (c) use
>>> cases using just additional vocabulary.
>>
>> I'm pretty sure you cannot get from (b) to (c) with merely
>> additional vocabulary. Not in the way the semantics of (b) have be
>> tentatively defined so far. You'd really need extra stuff in the
>> structure of an interpretation.
>>
>>
>>>
>>> -- Sandro
>>>
>>>>
>>>> Best,
>>>
>>>
>>>
>>
>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>
>
>
> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
> http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF:
> http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
I
Received on Wednesday, 22 August 2012 12:59:19 UTC