Re: RDF dataset semantics again

Sandro, all,

Sorry again to write very very long emails. I've put tremendous amount 
of thinking in this email, so it's really hard to make it short and 
summarise all of it.
I'm very sorry to say that I'm leaning very much towards *not* adopting 
a formal semantics in the line of the RDF Graph Identification proposal 
suggests. I can try a summary:
  - what conclusion can we draw from a <name,graph> pair? In the G.I. 
proposal, essentially none;
  - we do not need quote-semantics if we want a faithful retranscription 
of an existing graph (e.g., the crawl use case);
  - the quote-semantics, as proposed, does not match the notion of 
quoting in natural language;
  - all of SPARQL is based on applying an entailment regime to all the 
graphs in a target datasets, be they named or default;
  - SPARQL ASK on basic graph patterns and GRAPH graph patterns matches 
very precisely the semantics of dataset that I proposed.
  Please read on for detailed explanations on these items.

First, let me summarise the things on which we seem to agree:

  1. considering all the discussions on use cases, existing 
implementations, SPARQL specs, etc we agree that imposing that the graph 
IRI denotes the graph itself is too strong;
  2. we want a minimal semantics, as little constrained as possible, 
such that alternative semantics can be defined (by this group or 
another) as extensions of it by adding more constrains.
  3. a dataset with no named graphs "behaves" as if it was a normal RDF 
graph (in mathematical terms, we can say that there is an injective 
morphism from RDF Graphs to RDF Datasets, which means we can assimilate 
an RDF Graph to a corresponding RDF Dataset with no named graphs).

Let us imagine we only do that, proposing a minimal semantics that 
fulfill the 3 items. Formally, one possible proposal could be the following:

A simple-dataset-interpretation (or an 
rdf/rdfs/d/owl-dataset-interpretation) wrt vocabulary V is a 
simple-interpretation (or an rdf/rdfs/d/owl-interpretation) wrt to 
vocabulary V \union {rdf:hasGraph} such that:

  - if a dataset D includes a default graph G, then I(G) = false implies 
I(D) = false;
  - if a dataset D includes a named graph <n,G>, then G in IR (i.e., in 
the set of resources of interpretation I), n is in vocabulary V, and 
<I(n),G> belongs to IEXT(I(rdf:hasGraph))
  - in any other case, I(D) is false for a dataset D.

The problem is, without further restrictions, this leads to a semantics 
of "no-semantics" for named graphs. We are not allowed to draw any 
conclusion from a <name,graph> pair. We end up formalising, as a model 
theoretic semantics, the notion of "no semantics".

Let me explain this by reducing the case to the RDF semantics. We all 
agree that RDF talks about resources, that literals are a special case 
of resources, that URIs denote resources and there exist relationships 
between resources. But we are not all agreeing to make entailments on 
RDF data because there are times when we want to faithfully transmit an 
RDF graph exactly as it was produced.

So we formalise the "semantics of no-semantics" of RDF like this:
a no-interpretation is a tuple (IR,IP,LV,IS,IL,IEXT) such that:
  - IR is a set of resources,
  - IP is ..., etc... (see RDF Semantics)

denotation of graphs:
  - for an RDF graph G, I(G) is true iff G is in IR.

this is a semantics where graphs do not entail anything, except 
themselves. All the semantics in RDF Semantics 2004 can be derived from 
this by adding more constraints. So we are happy as we have the core 
semantics from which everything else derives.

BUT this is absurd!  You don't need to define a semantics of 
no-semantics. If you need to keep the original triples, you simply do 
not apply the semantics, or at least not to the data you must share. If 
you want to transmit a faithful representation of graph, just do it! 
It's legal. It'd done all the time. It does not prevent anyone, 
including the one who share a faithful copy of an existing graph, to 
draw conclusions from the graph.

That is what a crawler does: it meets normal RDF graphs in the wild and 
faithfully transcribes them into named graphs, even though, as they are 
RDF Graphs, they have a normative semantics. The semantics does not have 
any effect on graphs. A formal semantics does *nothing*. It does not put 
conclusions in people's mouth.

A semantics tells you what you are *allowed* to conclude. It does not 
tell you either what to do with these conclusions, nor what you are 
*forced* to conclude. And frankly, I would really like to be allowed to 
conclude, even without further information, that <g> { <s> <p> [] } 
holds whenever <g> { <s> <p> <o> } holds. I think, after all, that 
there's hardly one, if any at all, use case which requires that it is 
not allowed to draw this conclusion.

Take this other angle: assume we have a Web crawler or application that 
fetches RDF documents online. It looks up 
and gets an RDF graph. Distinguish 2 possibilities:
  1.  It puts the RDF graph into a <name,graph> pair. It ends up with, 
for instance:

  ex:stuff.rdf { <s> <p> <o> .}

Given the quote-semantics, it is not allowed to draw the following 
conclusion, unless some extra information comes:

  ex:stuff.rdf { <s> <p> <o> . <p> a rdf:Property .}

  2.  It applies operations on the RDF graph to build the RDF-closure of 
the RDF graph, that is, it simply draws conclusion from the graph. It 
then injects the closure into a <name,graph> pair and ends up with:

  ex:stuff.rdf { <s> <p> <o> . <p> a rdf:Property .}

This is all legal, semantically valid operations. The final named graph 
is obtained from the two elements "ex:stuff.rdf" and "{<s> <p> <o>}" by 
drawing conclusion in RDF and keeping the IRI to index it.

So, the construction would be valid and directly following logically 
from the given graph and its IRI, but the <name,graph> pair would not 
carry the conclusion nonetheless. What kind of semantics is that?

Another point is that SPARQL relies on an entailment regime (simple 
entailment only for SPARQL 1.0), which it uses on all of the graphs 
interrogated in a dataset. There is no special treatments of graphs 
inside <name,graph> pairs.


   GRAPH <g> { <s> <p> [] }

answers yes iff the dataset:

<g> { <s> <p> [] }

is entailed by the target dataset according to the semantics of [1] 
(which is (c) in my previous email). However, this answer has no 
relationship with the quoting semantics, except if, by chance, the graph 
named <g> happens to be exactly the triple "<s> <p> []".

[1]  Semantics, in TF-Graphs/RDF-Datasets-Proposal.

Le 20/08/2012 19:11, Sandro Hawke a écrit :
> On 08/20/2012 10:02 AM, Antoine Zimmermann wrote:
> I believe it's possible to handle the use cases that want (a) and (c) by
> standardizing on (b) and then defining additional RDF vocabulary terms
> (either now or later).

I don't know how you can go from (b) to (c) or from (b) to (a). I have 
not yet seen a fully stabilised version of (b), but the ones that have 
been sketched do not make it easy to do so. However, there is a stable 
and complette version of (c) and I can tell you here how you can go from 
(c) to (a). It suffices to add the following semantic condition to the 
proposal of [1]:

  - for all names n1, n2 in the vocabulary V, Con(n1) = Con(n2).

[1]  Semantics, in TF-Graphs/RDF-Datasets-Proposal.

And if one wants to quote graphs, maybe they should use double quotes:

  <g>  ex:hasGraph  "<s> <p> <o>"^^ex:Graph .

which is valid and consistent RDF. This has exactly the semantics of 
"no-semantics" described above.

BTW, the action of quoting in natural language does not reduce the 
possible inferences, it increases them. Compare:

  - Joe said the war is over.
  - Joe said "the war is over".

In both cases, I can infer that Joe told that the war has come to and 
end. But in the second case, I know in addition that Joe used the word 
"over". So, if we really want to simulate quotes, then it should be a 
more expressive semantics rather than a weaker. So maybe we can define 
(b) in function of (c) rather than the opposite.

> (As an aside: I don't think the priorities have any formal weight. The
> WG has never resolved to accept or reject or prioritize any uses as more
> important than any other.)

Yep, no formal weight but the priorities are showing which use cases are 
more important than others, in the view of people from this working 
group. That's enough to take a serious look at the highest priority.

>> Also, the condition ∀i: I(ui) = Gi is problematic. At first, it seems
>> to be natural to say that the graph IRI RDF-denotes the graph. But:
>> "RESOLVED: Named Graphs in SPARQL associate IRIs and graphs *but* they
>> do not necessarily "name" graphs in the strict model-theoretic sense.
>> A SPARQL Dataset does not establish graphs as referents of IRIs
>> (relevant to ISSUE-30)".
>> I know this resolution is about SPARQL datasets, and it's not
>> necessarily applying to whatever structure we come up with in RDF, but
>> one of the Priority A use cases is to be able to dump a SPARQL store.
>> With this resolution, there is apparently a clash between the use case
>> requirement and the semantic condition.
> I agree. I'm pretty sure ∀i: I(ui) = Gi is wrong. Most of the time, in
> practice, Ui denotes a g-box, not a g-snap. (Or, sometimes, it's
> something else associated with a g-box, like the primary subject.) I
> don't see how SPARQL 1.1 UPDATE with the GRAPH keyword makes any sense
> if Ui denotes Gi.

The GRAPH keyword has its own semantics defined by SPARQL. It does not 
relate to the RDF semantics. The GRAPH keyword is just an indication 
that we want to work with the RDF graph inside a certain <name,graph> 
pair. It is totally independent of what the URI denotes in RDF semantics.

>> My proposal is to define several recommended semantics and allow the
>> concrete syntax to declare in a document what semantics is assumed
>> when exchanging a dataset.
>> I find this idea appealing because it is in line with the fact that
>> information carried by HTTP is accompanied by a self description of
>> how it should be understood. For instance, we have MIME types, we have
>> <!DOCTYPE> declarations, etc. Since RDF is not a purely syntactical
>> datastructure, it makes sense that it carries with it a reference to
>> the semantics it uses.
>> Such practices of referencing the MIME type, charset, doctype, schema,
>> etc have been a key enabler of interoperability on the Web. Why not
>> extend the pattern to the formal semantics?
>> BTW, SPARQL services have a way to tell what inferrence regime they
>> support, and SPARQL queries have a way to ask for a particular regime.
>> I pretend that my proposal is simply in agreement with already
>> accepted notions in the SPARQL world.
> I see the appeal -- solving each kind of problem with an approach
> crafted directly for it -- but my sense is this would cause too much
> confusion in the market and result a lack of interoperability. I think
> we're better off standardizing (b) now, as long as I'm right that we can
> address the (a) and (c) use cases using just additional vocabulary.

I'm pretty sure you cannot get from (b) to (c) with merely additional 
vocabulary. Not in the way the semantics of (b) have be tentatively 
defined so far. You'd really need extra stuff in the structure of an 

> -- Sandro
>> Best,

Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66

Received on Wednesday, 22 August 2012 08:28:38 UTC