- From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
- Date: Wed, 22 Aug 2012 10:28:09 +0200
- To: public-rdf-wg@w3.org
Sandro, all,
Sorry again to write very very long emails. I've put tremendous amount
of thinking in this email, so it's really hard to make it short and
summarise all of it.
I'm very sorry to say that I'm leaning very much towards *not* adopting
a formal semantics in the line of the RDF Graph Identification proposal
suggests. I can try a summary:
- what conclusion can we draw from a <name,graph> pair? In the G.I.
proposal, essentially none;
- we do not need quote-semantics if we want a faithful retranscription
of an existing graph (e.g., the crawl use case);
- the quote-semantics, as proposed, does not match the notion of
quoting in natural language;
- all of SPARQL is based on applying an entailment regime to all the
graphs in a target datasets, be they named or default;
- SPARQL ASK on basic graph patterns and GRAPH graph patterns matches
very precisely the semantics of dataset that I proposed.
Please read on for detailed explanations on these items.
First, let me summarise the things on which we seem to agree:
1. considering all the discussions on use cases, existing
implementations, SPARQL specs, etc we agree that imposing that the graph
IRI denotes the graph itself is too strong;
2. we want a minimal semantics, as little constrained as possible,
such that alternative semantics can be defined (by this group or
another) as extensions of it by adding more constrains.
3. a dataset with no named graphs "behaves" as if it was a normal RDF
graph (in mathematical terms, we can say that there is an injective
morphism from RDF Graphs to RDF Datasets, which means we can assimilate
an RDF Graph to a corresponding RDF Dataset with no named graphs).
Let us imagine we only do that, proposing a minimal semantics that
fulfill the 3 items. Formally, one possible proposal could be the following:
A simple-dataset-interpretation (or an
rdf/rdfs/d/owl-dataset-interpretation) wrt vocabulary V is a
simple-interpretation (or an rdf/rdfs/d/owl-interpretation) wrt to
vocabulary V \union {rdf:hasGraph} such that:
- if a dataset D includes a default graph G, then I(G) = false implies
I(D) = false;
- if a dataset D includes a named graph <n,G>, then G in IR (i.e., in
the set of resources of interpretation I), n is in vocabulary V, and
<I(n),G> belongs to IEXT(I(rdf:hasGraph))
- in any other case, I(D) is false for a dataset D.
The problem is, without further restrictions, this leads to a semantics
of "no-semantics" for named graphs. We are not allowed to draw any
conclusion from a <name,graph> pair. We end up formalising, as a model
theoretic semantics, the notion of "no semantics".
Let me explain this by reducing the case to the RDF semantics. We all
agree that RDF talks about resources, that literals are a special case
of resources, that URIs denote resources and there exist relationships
between resources. But we are not all agreeing to make entailments on
RDF data because there are times when we want to faithfully transmit an
RDF graph exactly as it was produced.
So we formalise the "semantics of no-semantics" of RDF like this:
a no-interpretation is a tuple (IR,IP,LV,IS,IL,IEXT) such that:
- IR is a set of resources,
- IP is ..., etc... (see RDF Semantics)
denotation of graphs:
- for an RDF graph G, I(G) is true iff G is in IR.
this is a semantics where graphs do not entail anything, except
themselves. All the semantics in RDF Semantics 2004 can be derived from
this by adding more constraints. So we are happy as we have the core
semantics from which everything else derives.
BUT this is absurd! You don't need to define a semantics of
no-semantics. If you need to keep the original triples, you simply do
not apply the semantics, or at least not to the data you must share. If
you want to transmit a faithful representation of graph, just do it!
It's legal. It'd done all the time. It does not prevent anyone,
including the one who share a faithful copy of an existing graph, to
draw conclusions from the graph.
That is what a crawler does: it meets normal RDF graphs in the wild and
faithfully transcribes them into named graphs, even though, as they are
RDF Graphs, they have a normative semantics. The semantics does not have
any effect on graphs. A formal semantics does *nothing*. It does not put
conclusions in people's mouth.
A semantics tells you what you are *allowed* to conclude. It does not
tell you either what to do with these conclusions, nor what you are
*forced* to conclude. And frankly, I would really like to be allowed to
conclude, even without further information, that <g> { <s> <p> [] }
holds whenever <g> { <s> <p> <o> } holds. I think, after all, that
there's hardly one, if any at all, use case which requires that it is
not allowed to draw this conclusion.
Take this other angle: assume we have a Web crawler or application that
fetches RDF documents online. It looks up http://example.com/stuff.rdf
and gets an RDF graph. Distinguish 2 possibilities:
1. It puts the RDF graph into a <name,graph> pair. It ends up with,
for instance:
ex:stuff.rdf { <s> <p> <o> .}
Given the quote-semantics, it is not allowed to draw the following
conclusion, unless some extra information comes:
ex:stuff.rdf { <s> <p> <o> . <p> a rdf:Property .}
2. It applies operations on the RDF graph to build the RDF-closure of
the RDF graph, that is, it simply draws conclusion from the graph. It
then injects the closure into a <name,graph> pair and ends up with:
ex:stuff.rdf { <s> <p> <o> . <p> a rdf:Property .}
This is all legal, semantically valid operations. The final named graph
is obtained from the two elements "ex:stuff.rdf" and "{<s> <p> <o>}" by
drawing conclusion in RDF and keeping the IRI to index it.
So, the construction would be valid and directly following logically
from the given graph and its IRI, but the <name,graph> pair would not
carry the conclusion nonetheless. What kind of semantics is that?
Another point is that SPARQL relies on an entailment regime (simple
entailment only for SPARQL 1.0), which it uses on all of the graphs
interrogated in a dataset. There is no special treatments of graphs
inside <name,graph> pairs.
So:
ASK WHERE {
GRAPH <g> { <s> <p> [] }
}
answers yes iff the dataset:
<g> { <s> <p> [] }
is entailed by the target dataset according to the semantics of [1]
(which is (c) in my previous email). However, this answer has no
relationship with the quoting semantics, except if, by chance, the graph
named <g> happens to be exactly the triple "<s> <p> []".
[1] Semantics, in TF-Graphs/RDF-Datasets-Proposal.
http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
Le 20/08/2012 19:11, Sandro Hawke a écrit :
> On 08/20/2012 10:02 AM, Antoine Zimmermann wrote:
>
> I believe it's possible to handle the use cases that want (a) and (c) by
> standardizing on (b) and then defining additional RDF vocabulary terms
> (either now or later).
I don't know how you can go from (b) to (c) or from (b) to (a). I have
not yet seen a fully stabilised version of (b), but the ones that have
been sketched do not make it easy to do so. However, there is a stable
and complette version of (c) and I can tell you here how you can go from
(c) to (a). It suffices to add the following semantic condition to the
proposal of [1]:
- for all names n1, n2 in the vocabulary V, Con(n1) = Con(n2).
[1] Semantics, in TF-Graphs/RDF-Datasets-Proposal.
http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal#Semantics
And if one wants to quote graphs, maybe they should use double quotes:
<g> ex:hasGraph "<s> <p> <o>"^^ex:Graph .
which is valid and consistent RDF. This has exactly the semantics of
"no-semantics" described above.
BTW, the action of quoting in natural language does not reduce the
possible inferences, it increases them. Compare:
- Joe said the war is over.
- Joe said "the war is over".
In both cases, I can infer that Joe told that the war has come to and
end. But in the second case, I know in addition that Joe used the word
"over". So, if we really want to simulate quotes, then it should be a
more expressive semantics rather than a weaker. So maybe we can define
(b) in function of (c) rather than the opposite.
> (As an aside: I don't think the priorities have any formal weight. The
> WG has never resolved to accept or reject or prioritize any uses as more
> important than any other.)
Yep, no formal weight but the priorities are showing which use cases are
more important than others, in the view of people from this working
group. That's enough to take a serious look at the highest priority.
>> Also, the condition ∀i: I(ui) = Gi is problematic. At first, it seems
>> to be natural to say that the graph IRI RDF-denotes the graph. But:
>>
>> http://www.w3.org/2011/rdf-wg/meeting/2011-04-14#resolution_1
>>
>> "RESOLVED: Named Graphs in SPARQL associate IRIs and graphs *but* they
>> do not necessarily "name" graphs in the strict model-theoretic sense.
>> A SPARQL Dataset does not establish graphs as referents of IRIs
>> (relevant to ISSUE-30)".
>>
>> I know this resolution is about SPARQL datasets, and it's not
>> necessarily applying to whatever structure we come up with in RDF, but
>> one of the Priority A use cases is to be able to dump a SPARQL store.
>> With this resolution, there is apparently a clash between the use case
>> requirement and the semantic condition.
>>
>
> I agree. I'm pretty sure ∀i: I(ui) = Gi is wrong. Most of the time, in
> practice, Ui denotes a g-box, not a g-snap. (Or, sometimes, it's
> something else associated with a g-box, like the primary subject.) I
> don't see how SPARQL 1.1 UPDATE with the GRAPH keyword makes any sense
> if Ui denotes Gi.
The GRAPH keyword has its own semantics defined by SPARQL. It does not
relate to the RDF semantics. The GRAPH keyword is just an indication
that we want to work with the RDF graph inside a certain <name,graph>
pair. It is totally independent of what the URI denotes in RDF semantics.
>>
>> My proposal is to define several recommended semantics and allow the
>> concrete syntax to declare in a document what semantics is assumed
>> when exchanging a dataset.
>>
>> I find this idea appealing because it is in line with the fact that
>> information carried by HTTP is accompanied by a self description of
>> how it should be understood. For instance, we have MIME types, we have
>> <!DOCTYPE> declarations, etc. Since RDF is not a purely syntactical
>> datastructure, it makes sense that it carries with it a reference to
>> the semantics it uses.
>> Such practices of referencing the MIME type, charset, doctype, schema,
>> etc have been a key enabler of interoperability on the Web. Why not
>> extend the pattern to the formal semantics?
>> BTW, SPARQL services have a way to tell what inferrence regime they
>> support, and SPARQL queries have a way to ask for a particular regime.
>> I pretend that my proposal is simply in agreement with already
>> accepted notions in the SPARQL world.
>>
>
> I see the appeal -- solving each kind of problem with an approach
> crafted directly for it -- but my sense is this would cause too much
> confusion in the market and result a lack of interoperability. I think
> we're better off standardizing (b) now, as long as I'm right that we can
> address the (a) and (c) use cases using just additional vocabulary.
I'm pretty sure you cannot get from (b) to (c) with merely additional
vocabulary. Not in the way the semantics of (b) have be tentatively
defined so far. You'd really need extra stuff in the structure of an
interpretation.
>
> -- Sandro
>
>>
>> Best,
>
>
>
--
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 22 August 2012 08:28:38 UTC