W3C home > Mailing lists > Public > public-rdf-wg@w3.org > August 2012

Re: RDF dataset semantics again

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Wed, 22 Aug 2012 16:09:33 +0200
Message-ID: <5034E81D.3020803@emse.fr>
To: public-rdf-wg@w3.org


Le 22/08/2012 14:58, Antoine Zimmermann a écrit :
> What I do not like in the arguments is the hypothetical "if". Yes, of
> course, if we can extend a minimal semantics to any other form of
> semantics by mere additional semantic conditions, then yes why not?
>
> But I pretend that you are not going to be able to do this from the
> quote-semantics to the dataset semantics of [1].
>
> Would it be ok if we could define the quote-semantics as a semantic
> extension of the semantics of [1]?
>
> Anyway, there is no need for an hypothetical "if": I just did it:
>
> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/Dataset-semantics
>
> This semantics extension of [1] gives the same entailments as what's in
> the RDF Graph Identification proposal. I you don't trust me, I'll
> provide a formal proof. (Or someone provides a counter example).

I overstated: in fact, it is more powerful than the semantics of RDF 
Graph Identification since it allows one to select which graph is quoted 
and which one is not. It becomes the semantics of RDF Graph Id only if 
all the graph IRI are made rdf:QuotedGraphs.


But there is a fundamental problem, which is that the extension is 
non-monotonic. This causes touble when merging.
But it's not specific to the semantics of [1], there's necessarily a 
problem when merging datasets which contain the same name with different 
graphs, no matter how the triples inside named graphs are interpreted.


AZ.

>
>
> So, to summarise, the proposal in [1]:
> - is extensible with proper semantic conditions to all kinds of other
> semantics;
> - with little semantic extension, can cover all the use cases of the
> quote-semantics;
> - covers in addition all the use case related to reasoning with multiple
> graphs (temporal, multi-source, etc);
> - is very much in line with the SPARQL model, based on entailment
> regimes at the graph level, just like SPARQL.
>
>
> Then I'd like to know what's wrong with this proposal?
>
>
> --AZ
>
>
> Le 22/08/2012 12:06, Ivan Herman a écrit :
>> Antoine,
>>
>> let me try to understand what you propose, because there are
>> different ways to interpret your mail. Is it:
>>
>> 1. RDF 1.1 should be completely silent on any semantics w.r.t.
>> datasets, or
>>
>> 2. RDF 1.1 should adopt [1] as the semantics w.r.t. datasets instead
>> of the 'quoting' semantics as the kind of 'base-line' semantics
>>
>>
>> As for #2: I do not have any fundamental issue with it, technically.
>> However, the proposal was first announced in March '11
>>
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Mar/0277.html
>>
>> followed by a discussion thread; then it continued in a further
>> discussion in a thread started by
>>
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Apr/0116.html
>>
>> finally, there were some revival in
>>
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Aug/0105.html
>>
>> I am probably missing some other threads, but the fact remains that
>> the WG could never get a consensus around [1]. _I am not interested
>> to know why_, by the way; let us say it is part of a collective
>> failure of the group.
>>
>> *If* the WG can get to a consensus around that semantics as a base
>> line now, I am personally fine with it (I do understand the arguments
>> against the quote semantics). The feeling among ourselves, when we
>> put together the document, was that the quote semantics is pretty
>> much the bare minimum that the WG nay get a consensus on and, if we
>> define some sort of an extension mechanism, others like the one in
>> [1] can also be expressed.
>>
>> Of course, we can go the #1 line. I would prefer not, and find a
>> minimum, but I will not lie down the road if that is what we will end
>> up with...
>>
>> Ivan
>>
>> [1]
>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
>>
>>
>>
>>
>>
>> On Aug 22, 2012, at 10:28 , Antoine Zimmermann wrote:
>>
>>> Sandro, all,
>>>
>>>
>>> Sorry again to write very very long emails. I've put tremendous
>>> amount of thinking in this email, so it's really hard to make it
>>> short and summarise all of it. I'm very sorry to say that I'm
>>> leaning very much towards *not* adopting a formal semantics in the
>>> line of the RDF Graph Identification proposal suggests. I can try a
>>> summary: - what conclusion can we draw from a<name,graph> pair? In
>>> the G.I. proposal, essentially none; - we do not need
>>> quote-semantics if we want a faithful retranscription of an
>>> existing graph (e.g., the crawl use case); - the quote-semantics,
>>> as proposed, does not match the notion of quoting in natural
>>> language; - all of SPARQL is based on applying an entailment regime
>>> to all the graphs in a target datasets, be they named or default; -
>>> SPARQL ASK on basic graph patterns and GRAPH graph patterns matches
>>> very precisely the semantics of dataset that I proposed. Please
>>> read on for detailed explanations on these items.
>>>
>>>
>>> First, let me summarise the things on which we seem to agree:
>>>
>>> 1. considering all the discussions on use cases, existing
>>> implementations, SPARQL specs, etc we agree that imposing that the
>>> graph IRI denotes the graph itself is too strong; 2. we want a
>>> minimal semantics, as little constrained as possible, such that
>>> alternative semantics can be defined (by this group or another) as
>>> extensions of it by adding more constrains. 3. a dataset with no
>>> named graphs "behaves" as if it was a normal RDF graph (in
>>> mathematical terms, we can say that there is an injective morphism
>>> from RDF Graphs to RDF Datasets, which means we can assimilate an
>>> RDF Graph to a corresponding RDF Dataset with no named graphs).
>>>
>>>
>>> Let us imagine we only do that, proposing a minimal semantics that
>>> fulfill the 3 items. Formally, one possible proposal could be the
>>> following:
>>>
>>> A simple-dataset-interpretation (or an
>>> rdf/rdfs/d/owl-dataset-interpretation) wrt vocabulary V is a
>>> simple-interpretation (or an rdf/rdfs/d/owl-interpretation) wrt to
>>> vocabulary V \union {rdf:hasGraph} such that:
>>>
>>> - if a dataset D includes a default graph G, then I(G) = false
>>> implies I(D) = false; - if a dataset D includes a named graph<n,G>,
>>> then G in IR (i.e., in the set of resources of interpretation I), n
>>> is in vocabulary V, and<I(n),G> belongs to IEXT(I(rdf:hasGraph)) -
>>> in any other case, I(D) is false for a dataset D.
>>>
>>>
>>> The problem is, without further restrictions, this leads to a
>>> semantics of "no-semantics" for named graphs. We are not allowed to
>>> draw any conclusion from a<name,graph> pair. We end up
>>> formalising, as a model theoretic semantics, the notion of "no
>>> semantics".
>>>
>>> Let me explain this by reducing the case to the RDF semantics. We
>>> all agree that RDF talks about resources, that literals are a
>>> special case of resources, that URIs denote resources and there
>>> exist relationships between resources. But we are not all agreeing
>>> to make entailments on RDF data because there are times when we
>>> want to faithfully transmit an RDF graph exactly as it was
>>> produced.
>>>
>>> So we formalise the "semantics of no-semantics" of RDF like this: a
>>> no-interpretation is a tuple (IR,IP,LV,IS,IL,IEXT) such that: - IR
>>> is a set of resources, - IP is ..., etc... (see RDF Semantics)
>>>
>>> denotation of graphs: - for an RDF graph G, I(G) is true iff G is
>>> in IR.
>>>
>>> this is a semantics where graphs do not entail anything, except
>>> themselves. All the semantics in RDF Semantics 2004 can be derived
>>> from this by adding more constraints. So we are happy as we have
>>> the core semantics from which everything else derives.
>>>
>>>
>>> BUT this is absurd! You don't need to define a semantics of
>>> no-semantics. If you need to keep the original triples, you simply
>>> do not apply the semantics, or at least not to the data you must
>>> share. If you want to transmit a faithful representation of graph,
>>> just do it! It's legal. It'd done all the time. It does not prevent
>>> anyone, including the one who share a faithful copy of an existing
>>> graph, to draw conclusions from the graph.
>>>
>>> That is what a crawler does: it meets normal RDF graphs in the wild
>>> and faithfully transcribes them into named graphs, even though, as
>>> they are RDF Graphs, they have a normative semantics. The semantics
>>> does not have any effect on graphs. A formal semantics does
>>> *nothing*. It does not put conclusions in people's mouth.
>>>
>>> A semantics tells you what you are *allowed* to conclude. It does
>>> not tell you either what to do with these conclusions, nor what you
>>> are *forced* to conclude. And frankly, I would really like to be
>>> allowed to conclude, even without further information, that<g>
>>> {<s> <p> [] } holds whenever<g> {<s> <p> <o> } holds. I
>>> think, after all, that there's hardly one, if any at all, use case
>>> which requires that it is not allowed to draw this conclusion.
>>>
>>>
>>> Take this other angle: assume we have a Web crawler or application
>>> that fetches RDF documents online. It looks up
>>> http://example.com/stuff.rdf and gets an RDF graph. Distinguish 2
>>> possibilities: 1. It puts the RDF graph into a<name,graph> pair.
>>> It ends up with, for instance:
>>>
>>> ex:stuff.rdf {<s> <p> <o> .}
>>>
>>> Given the quote-semantics, it is not allowed to draw the following
>>> conclusion, unless some extra information comes:
>>>
>>> ex:stuff.rdf {<s> <p> <o> .<p> a rdf:Property .}
>>>
>>> 2. It applies operations on the RDF graph to build the RDF-closure
>>> of the RDF graph, that is, it simply draws conclusion from the
>>> graph. It then injects the closure into a<name,graph> pair and
>>> ends up with:
>>>
>>> ex:stuff.rdf {<s> <p> <o> .<p> a rdf:Property .}
>>>
>>> This is all legal, semantically valid operations. The final named
>>> graph is obtained from the two elements "ex:stuff.rdf" and "{<s>
>>> <p> <o>}" by drawing conclusion in RDF and keeping the IRI to
>>> index it.
>>>
>>> So, the construction would be valid and directly following
>>> logically from the given graph and its IRI, but the<name,graph>
>>> pair would not carry the conclusion nonetheless. What kind of
>>> semantics is that?
>>>
>>>
>>>
>>> Another point is that SPARQL relies on an entailment regime (simple
>>> entailment only for SPARQL 1.0), which it uses on all of the graphs
>>> interrogated in a dataset. There is no special treatments of graphs
>>> inside<name,graph> pairs.
>>>
>>> So:
>>>
>>> ASK WHERE { GRAPH<g> {<s> <p> [] } }
>>>
>>> answers yes iff the dataset:
>>>
>>> <g> {<s> <p> [] }
>>>
>>> is entailed by the target dataset according to the semantics of [1]
>>> (which is (c) in my previous email). However, this answer has no
>>> relationship with the quoting semantics, except if, by chance, the
>>> graph named<g> happens to be exactly the triple "<s> <p> []".
>>>
>>>
>>> [1] Semantics, in TF-Graphs/RDF-Datasets-Proposal.
>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
>>>
>>>
>>>
>>>
>>>
> Le 20/08/2012 19:11, Sandro Hawke a écrit :
>>>> On 08/20/2012 10:02 AM, Antoine Zimmermann wrote:
>>>>
>>>> I believe it's possible to handle the use cases that want (a) and
>>>> (c) by standardizing on (b) and then defining additional RDF
>>>> vocabulary terms (either now or later).
>>>
>>> I don't know how you can go from (b) to (c) or from (b) to (a). I
>>> have not yet seen a fully stabilised version of (b), but the ones
>>> that have been sketched do not make it easy to do so. However,
>>> there is a stable and complette version of (c) and I can tell you
>>> here how you can go from (c) to (a). It suffices to add the
>>> following semantic condition to the proposal of [1]:
>>>
>>> - for all names n1, n2 in the vocabulary V, Con(n1) = Con(n2).
>>>
>>> [1] Semantics, in TF-Graphs/RDF-Datasets-Proposal.
>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
>>>
>>>
>>>
>>>
> And if one wants to quote graphs, maybe they should use double quotes:
>>>
>>> <g> ex:hasGraph "<s> <p> <o>"^^ex:Graph .
>>>
>>> which is valid and consistent RDF. This has exactly the semantics
>>> of "no-semantics" described above.
>>>
>>> BTW, the action of quoting in natural language does not reduce the
>>> possible inferences, it increases them. Compare:
>>>
>>> - Joe said the war is over. - Joe said "the war is over".
>>>
>>> In both cases, I can infer that Joe told that the war has come to
>>> and end. But in the second case, I know in addition that Joe used
>>> the word "over". So, if we really want to simulate quotes, then it
>>> should be a more expressive semantics rather than a weaker. So
>>> maybe we can define (b) in function of (c) rather than the
>>> opposite.
>>>
>>>
>>>> (As an aside: I don't think the priorities have any formal
>>>> weight. The WG has never resolved to accept or reject or
>>>> prioritize any uses as more important than any other.)
>>>
>>> Yep, no formal weight but the priorities are showing which use
>>> cases are more important than others, in the view of people from
>>> this working group. That's enough to take a serious look at the
>>> highest priority.
>>>
>>>
>>>>> Also, the condition ∀i: I(ui) = Gi is problematic. At first, it
>>>>> seems to be natural to say that the graph IRI RDF-denotes the
>>>>> graph. But:
>>>>>
>>>>> http://www.w3.org/2011/rdf-wg/meeting/2011-04-14#resolution_1
>>>>>
>>>>> "RESOLVED: Named Graphs in SPARQL associate IRIs and graphs
>>>>> *but* they do not necessarily "name" graphs in the strict
>>>>> model-theoretic sense. A SPARQL Dataset does not establish
>>>>> graphs as referents of IRIs (relevant to ISSUE-30)".
>>>>>
>>>>> I know this resolution is about SPARQL datasets, and it's not
>>>>> necessarily applying to whatever structure we come up with in
>>>>> RDF, but one of the Priority A use cases is to be able to dump
>>>>> a SPARQL store. With this resolution, there is apparently a
>>>>> clash between the use case requirement and the semantic
>>>>> condition.
>>>>>
>>>>
>>>> I agree. I'm pretty sure ∀i: I(ui) = Gi is wrong. Most of the
>>>> time, in practice, Ui denotes a g-box, not a g-snap. (Or,
>>>> sometimes, it's something else associated with a g-box, like the
>>>> primary subject.) I don't see how SPARQL 1.1 UPDATE with the
>>>> GRAPH keyword makes any sense if Ui denotes Gi.
>>>
>>> The GRAPH keyword has its own semantics defined by SPARQL. It does
>>> not relate to the RDF semantics. The GRAPH keyword is just an
>>> indication that we want to work with the RDF graph inside a
>>> certain<name,graph> pair. It is totally independent of what the
>>> URI denotes in RDF semantics.
>>>
>>>
>>>>>
>>>>> My proposal is to define several recommended semantics and
>>>>> allow the concrete syntax to declare in a document what
>>>>> semantics is assumed when exchanging a dataset.
>>>>>
>>>>> I find this idea appealing because it is in line with the fact
>>>>> that information carried by HTTP is accompanied by a self
>>>>> description of how it should be understood. For instance, we
>>>>> have MIME types, we have <!DOCTYPE> declarations, etc. Since
>>>>> RDF is not a purely syntactical datastructure, it makes sense
>>>>> that it carries with it a reference to the semantics it uses.
>>>>> Such practices of referencing the MIME type, charset, doctype,
>>>>> schema, etc have been a key enabler of interoperability on the
>>>>> Web. Why not extend the pattern to the formal semantics? BTW,
>>>>> SPARQL services have a way to tell what inferrence regime they
>>>>> support, and SPARQL queries have a way to ask for a particular
>>>>> regime. I pretend that my proposal is simply in agreement with
>>>>> already accepted notions in the SPARQL world.
>>>>>
>>>>
>>>> I see the appeal -- solving each kind of problem with an
>>>> approach crafted directly for it -- but my sense is this would
>>>> cause too much confusion in the market and result a lack of
>>>> interoperability. I think we're better off standardizing (b) now,
>>>> as long as I'm right that we can address the (a) and (c) use
>>>> cases using just additional vocabulary.
>>>
>>> I'm pretty sure you cannot get from (b) to (c) with merely
>>> additional vocabulary. Not in the way the semantics of (b) have be
>>> tentatively defined so far. You'd really need extra stuff in the
>>> structure of an interpretation.
>>>
>>>
>>>>
>>>> -- Sandro
>>>>
>>>>>
>>>>> Best,
>>>>
>>>>
>>>>
>>>
>>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
>>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>>
>>
>>
>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF:
>> http://www.ivan-herman.net/foaf.rdf
>>
>>
>>
>>
>>
>>
>>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 22 August 2012 14:10:01 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:02:06 UTC