Re: RDF dataset semantics again

Antoine,

let me try to understand what you propose, because there are different ways to interpret your mail. Is it:

1. RDF 1.1 should be completely silent on any semantics w.r.t. datasets, or

2. RDF 1.1 should adopt [1] as the semantics w.r.t. datasets instead of the 'quoting' semantics as the kind of 'base-line' semantics


As for #2: I do not have any fundamental issue with it, technically. However, the proposal was first announced in March '11

http://lists.w3.org/Archives/Public/public-rdf-wg/2011Mar/0277.html

followed by a discussion thread; then it continued in a further discussion in a thread started by

http://lists.w3.org/Archives/Public/public-rdf-wg/2011Apr/0116.html

finally, there were some revival in

http://lists.w3.org/Archives/Public/public-rdf-wg/2011Aug/0105.html

I am probably missing some other threads, but the fact remains that the WG could never get a consensus around [1]. _I am not interested to know why_, by the way; let us say it is part of a collective failure of the group.

*If* the WG can get to a consensus around that semantics as a base line now, I am personally fine with it (I do understand the arguments against the quote semantics). The feeling among ourselves, when we put together the document, was that the quote semantics is pretty much the bare minimum that the WG nay get a consensus on and, if we define some sort of an extension mechanism, others like the one in [1] can also be expressed.

Of course, we can go the #1 line. I would prefer not, and find a minimum, but I will not lie down the road if that is what we will end up with...

Ivan

[1] http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics



On Aug 22, 2012, at 10:28 , Antoine Zimmermann wrote:

> Sandro, all,
> 
> 
> Sorry again to write very very long emails. I've put tremendous amount of thinking in this email, so it's really hard to make it short and summarise all of it.
> I'm very sorry to say that I'm leaning very much towards *not* adopting a formal semantics in the line of the RDF Graph Identification proposal suggests. I can try a summary:
> - what conclusion can we draw from a <name,graph> pair? In the G.I. proposal, essentially none;
> - we do not need quote-semantics if we want a faithful retranscription of an existing graph (e.g., the crawl use case);
> - the quote-semantics, as proposed, does not match the notion of quoting in natural language;
> - all of SPARQL is based on applying an entailment regime to all the graphs in a target datasets, be they named or default;
> - SPARQL ASK on basic graph patterns and GRAPH graph patterns matches very precisely the semantics of dataset that I proposed.
> Please read on for detailed explanations on these items.
> 
> 
> First, let me summarise the things on which we seem to agree:
> 
> 1. considering all the discussions on use cases, existing implementations, SPARQL specs, etc we agree that imposing that the graph IRI denotes the graph itself is too strong;
> 2. we want a minimal semantics, as little constrained as possible, such that alternative semantics can be defined (by this group or another) as extensions of it by adding more constrains.
> 3. a dataset with no named graphs "behaves" as if it was a normal RDF graph (in mathematical terms, we can say that there is an injective morphism from RDF Graphs to RDF Datasets, which means we can assimilate an RDF Graph to a corresponding RDF Dataset with no named graphs).
> 
> 
> Let us imagine we only do that, proposing a minimal semantics that fulfill the 3 items. Formally, one possible proposal could be the following:
> 
> A simple-dataset-interpretation (or an rdf/rdfs/d/owl-dataset-interpretation) wrt vocabulary V is a simple-interpretation (or an rdf/rdfs/d/owl-interpretation) wrt to vocabulary V \union {rdf:hasGraph} such that:
> 
> - if a dataset D includes a default graph G, then I(G) = false implies I(D) = false;
> - if a dataset D includes a named graph <n,G>, then G in IR (i.e., in the set of resources of interpretation I), n is in vocabulary V, and <I(n),G> belongs to IEXT(I(rdf:hasGraph))
> - in any other case, I(D) is false for a dataset D.
> 
> 
> The problem is, without further restrictions, this leads to a semantics of "no-semantics" for named graphs. We are not allowed to draw any conclusion from a <name,graph> pair. We end up formalising, as a model theoretic semantics, the notion of "no semantics".
> 
> Let me explain this by reducing the case to the RDF semantics. We all agree that RDF talks about resources, that literals are a special case of resources, that URIs denote resources and there exist relationships between resources. But we are not all agreeing to make entailments on RDF data because there are times when we want to faithfully transmit an RDF graph exactly as it was produced.
> 
> So we formalise the "semantics of no-semantics" of RDF like this:
> a no-interpretation is a tuple (IR,IP,LV,IS,IL,IEXT) such that:
> - IR is a set of resources,
> - IP is ..., etc... (see RDF Semantics)
> 
> denotation of graphs:
> - for an RDF graph G, I(G) is true iff G is in IR.
> 
> this is a semantics where graphs do not entail anything, except themselves. All the semantics in RDF Semantics 2004 can be derived from this by adding more constraints. So we are happy as we have the core semantics from which everything else derives.
> 
> 
> BUT this is absurd!  You don't need to define a semantics of no-semantics. If you need to keep the original triples, you simply do not apply the semantics, or at least not to the data you must share. If you want to transmit a faithful representation of graph, just do it! It's legal. It'd done all the time. It does not prevent anyone, including the one who share a faithful copy of an existing graph, to draw conclusions from the graph.
> 
> That is what a crawler does: it meets normal RDF graphs in the wild and faithfully transcribes them into named graphs, even though, as they are RDF Graphs, they have a normative semantics. The semantics does not have any effect on graphs. A formal semantics does *nothing*. It does not put conclusions in people's mouth.
> 
> A semantics tells you what you are *allowed* to conclude. It does not tell you either what to do with these conclusions, nor what you are *forced* to conclude. And frankly, I would really like to be allowed to conclude, even without further information, that <g> { <s> <p> [] } holds whenever <g> { <s> <p> <o> } holds. I think, after all, that there's hardly one, if any at all, use case which requires that it is not allowed to draw this conclusion.
> 
> 
> Take this other angle: assume we have a Web crawler or application that fetches RDF documents online. It looks up http://example.com/stuff.rdf and gets an RDF graph. Distinguish 2 possibilities:
> 1.  It puts the RDF graph into a <name,graph> pair. It ends up with, for instance:
> 
> ex:stuff.rdf { <s> <p> <o> .}
> 
> Given the quote-semantics, it is not allowed to draw the following conclusion, unless some extra information comes:
> 
> ex:stuff.rdf { <s> <p> <o> . <p> a rdf:Property .}
> 
> 2.  It applies operations on the RDF graph to build the RDF-closure of the RDF graph, that is, it simply draws conclusion from the graph. It then injects the closure into a <name,graph> pair and ends up with:
> 
> ex:stuff.rdf { <s> <p> <o> . <p> a rdf:Property .}
> 
> This is all legal, semantically valid operations. The final named graph is obtained from the two elements "ex:stuff.rdf" and "{<s> <p> <o>}" by drawing conclusion in RDF and keeping the IRI to index it.
> 
> So, the construction would be valid and directly following logically from the given graph and its IRI, but the <name,graph> pair would not carry the conclusion nonetheless. What kind of semantics is that?
> 
> 
> 
> Another point is that SPARQL relies on an entailment regime (simple entailment only for SPARQL 1.0), which it uses on all of the graphs interrogated in a dataset. There is no special treatments of graphs inside <name,graph> pairs.
> 
> So:
> 
> ASK WHERE {
>  GRAPH <g> { <s> <p> [] }
> }
> 
> answers yes iff the dataset:
> 
> <g> { <s> <p> [] }
> 
> is entailed by the target dataset according to the semantics of [1] (which is (c) in my previous email). However, this answer has no relationship with the quoting semantics, except if, by chance, the graph named <g> happens to be exactly the triple "<s> <p> []".
> 
> 
> [1]  Semantics, in TF-Graphs/RDF-Datasets-Proposal. http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
> 
> 
> Le 20/08/2012 19:11, Sandro Hawke a écrit :
>> On 08/20/2012 10:02 AM, Antoine Zimmermann wrote:
>> 
>> I believe it's possible to handle the use cases that want (a) and (c) by
>> standardizing on (b) and then defining additional RDF vocabulary terms
>> (either now or later).
> 
> I don't know how you can go from (b) to (c) or from (b) to (a). I have not yet seen a fully stabilised version of (b), but the ones that have been sketched do not make it easy to do so. However, there is a stable and complette version of (c) and I can tell you here how you can go from (c) to (a). It suffices to add the following semantic condition to the proposal of [1]:
> 
> - for all names n1, n2 in the vocabulary V, Con(n1) = Con(n2).
> 
> [1]  Semantics, in TF-Graphs/RDF-Datasets-Proposal. http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
> 
> And if one wants to quote graphs, maybe they should use double quotes:
> 
> <g>  ex:hasGraph  "<s> <p> <o>"^^ex:Graph .
> 
> which is valid and consistent RDF. This has exactly the semantics of "no-semantics" described above.
> 
> BTW, the action of quoting in natural language does not reduce the possible inferences, it increases them. Compare:
> 
> - Joe said the war is over.
> - Joe said "the war is over".
> 
> In both cases, I can infer that Joe told that the war has come to and end. But in the second case, I know in addition that Joe used the word "over". So, if we really want to simulate quotes, then it should be a more expressive semantics rather than a weaker. So maybe we can define (b) in function of (c) rather than the opposite.
> 
> 
>> (As an aside: I don't think the priorities have any formal weight. The
>> WG has never resolved to accept or reject or prioritize any uses as more
>> important than any other.)
> 
> Yep, no formal weight but the priorities are showing which use cases are more important than others, in the view of people from this working group. That's enough to take a serious look at the highest priority.
> 
> 
>>> Also, the condition ∀i: I(ui) = Gi is problematic. At first, it seems
>>> to be natural to say that the graph IRI RDF-denotes the graph. But:
>>> 
>>> http://www.w3.org/2011/rdf-wg/meeting/2011-04-14#resolution_1
>>> 
>>> "RESOLVED: Named Graphs in SPARQL associate IRIs and graphs *but* they
>>> do not necessarily "name" graphs in the strict model-theoretic sense.
>>> A SPARQL Dataset does not establish graphs as referents of IRIs
>>> (relevant to ISSUE-30)".
>>> 
>>> I know this resolution is about SPARQL datasets, and it's not
>>> necessarily applying to whatever structure we come up with in RDF, but
>>> one of the Priority A use cases is to be able to dump a SPARQL store.
>>> With this resolution, there is apparently a clash between the use case
>>> requirement and the semantic condition.
>>> 
>> 
>> I agree. I'm pretty sure ∀i: I(ui) = Gi is wrong. Most of the time, in
>> practice, Ui denotes a g-box, not a g-snap. (Or, sometimes, it's
>> something else associated with a g-box, like the primary subject.) I
>> don't see how SPARQL 1.1 UPDATE with the GRAPH keyword makes any sense
>> if Ui denotes Gi.
> 
> The GRAPH keyword has its own semantics defined by SPARQL. It does not relate to the RDF semantics. The GRAPH keyword is just an indication that we want to work with the RDF graph inside a certain <name,graph> pair. It is totally independent of what the URI denotes in RDF semantics.
> 
> 
>>> 
>>> My proposal is to define several recommended semantics and allow the
>>> concrete syntax to declare in a document what semantics is assumed
>>> when exchanging a dataset.
>>> 
>>> I find this idea appealing because it is in line with the fact that
>>> information carried by HTTP is accompanied by a self description of
>>> how it should be understood. For instance, we have MIME types, we have
>>> <!DOCTYPE> declarations, etc. Since RDF is not a purely syntactical
>>> datastructure, it makes sense that it carries with it a reference to
>>> the semantics it uses.
>>> Such practices of referencing the MIME type, charset, doctype, schema,
>>> etc have been a key enabler of interoperability on the Web. Why not
>>> extend the pattern to the formal semantics?
>>> BTW, SPARQL services have a way to tell what inferrence regime they
>>> support, and SPARQL queries have a way to ask for a particular regime.
>>> I pretend that my proposal is simply in agreement with already
>>> accepted notions in the SPARQL world.
>>> 
>> 
>> I see the appeal -- solving each kind of problem with an approach
>> crafted directly for it -- but my sense is this would cause too much
>> confusion in the market and result a lack of interoperability. I think
>> we're better off standardizing (b) now, as long as I'm right that we can
>> address the (a) and (c) use cases using just additional vocabulary.
> 
> I'm pretty sure you cannot get from (b) to (c) with merely additional vocabulary. Not in the way the semantics of (b) have be tentatively defined so far. You'd really need extra stuff in the structure of an interpretation.
> 
> 
>> 
>> -- Sandro
>> 
>>> 
>>> Best,
>> 
>> 
>> 
> 
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 66 03
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Wednesday, 22 August 2012 10:07:25 UTC