Re: Draft for a "minimal dataset semantics" from Richard Cyganiak on 2012-09-06 (public-rdf-wg@w3.org from September 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 6 Sep 2012 22:01:07 +0100
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-Id: <E814D4DA-A354-403E-AAC8-DC2760BAEF91@cyganiak.de>
Antoine,

You're a star. See inline for my opinions on the various issues, some requests for clarification, and some proposed additional issues.

On 5 Sep 2012, at 15:56, Antoine Zimmermann wrote:
> Based on the recent discussions on dataset semantics, which seemed to be rather fruitful, I made a first attempt to write down the latest ideas, as David suggested me to do, in order to have a basis for discussion in our telecon.
> 
> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/Minimal-dataset-semantics
> 
> I've put a short informal introduction as well as the model-theoretic formalisation.
> 
> I also recorded issues that may have to be solved and can affect the semantics.

> 

> Issue 1: can the entailment regime of the default graph be different from the one of the <name,graph> pairs?

I can't think of a good reason to have them separate. +1 to “are the same”, -0 to “can be different”.

> Issue 2: do we want to allow an entailment regime that is "weaker" than Simple Entailment? Something like the "no-semantics" in one of our previous proposals.

I don't see the point of a weaker semantics. +1 to “don't define a weaker semantics”.

I might get this wrong, but the “no-semantics” proposal linked in the document strikes me as a *stronger* rather than *weaker* semantics. The “weakest” semantics (let's call it 0-entailment) is the one where a graph A entails any graph isomorphic to A, and nothing else. There are no contradictions. The “strongest” semantics (let's call it thats-not-what-I-said-entailment) is the one where a graph A entails any graph isomorphic to A, and contradicts any other graph. It seems to me that the “no-semantics” proposal linked there is equivalent to using thats-not-what-I-said-entailment for the named graphs.

> Issue 3: can a dataset declare what semantics it assumes, instead of letting the application decide in all cases? This was proposed as a possible extension in a previous proposal.

I don't see the point. It's one of those things that people will inevitably either not set, or set wrongly, so those who actually have a reasoner at their disposal would usually ignore the “semantics marker”.

> Issue 4: should the relationship be between "name" and graph or between resource denoted by "name" and graph? The latter can be made a proper semantic extension of the former, not the opposite.

I'm not sure I understand. Is this about the distinction between the IRI-IGEXT and RES-IGEXT designs? In that case, +0.5 for IRI-IGEXT and +1 for RES-IGEXT. Or is this about the distinction between the design where n in <n,G> denotes a graph, versus the design where n could denote any resource? In that case, +1 to denoting any resource, and -1 to forcing the denotation to be a graph.

> Issue 5: this semantics does not completely covers the "graph quote" use case where one wants to explicitly say that the graph is quoted, that is, the terms used, in addition to their meaning, are important.

I think graph quoting can be done as a proper semantic extension by using thats-not-what-I-said-entailment on the named graphs. Can someone who is good at maths please check that?

Issue 6: Is IGEXT formalized as a function from IRIs to graphs, or from resources to graphs?

(I've re-phrased the issue.) +1 for resources, +0.5 to IRIs. Both work well enough, I think. I'm not sure that I fully appreciate the consequences of choosing one over the other. I tried to think through how one could frame the various proposals in the httpRange-14 debate with either formalism, but this just gave me a headache.

I guess I'd add a couple more issues:

Issue 7: Should it be sufficient for the truth of I(<n,g>) that IGEXT(n) E-entails g, or do we require that IGEXT(n) is equivalent to g under E-entailment? This is open-graph versus closed-graph semantics.

It seems to me that the open-graph version (entailment, not equivalence) meshes better with the open-world assumption of RDF. I believe it is also the weaker semantics, and closed-graph can be done as a proper semantic extension.

Issue 8: Should the truth of a named graph require that the named graph satisfies the default graph?

It seems to me that this could be useful, because I could dump some “global truths” like vocabulary definitions into the default graph, and actually have them take effect in all the named graphs. But I haven't thought hard about this, and it may have undesirable side effects that I'm missing. I could live with the simpler “no” answer.

Issue 9: Should we allow different entailment regimes in different named graphs based on statements made in the dataset?

This seems to be coming up again and again, so it's worth flagging as an open issue. I'd prefer no as an answer.

Issue 10: In <n,G>, does n denote G, or may n denote any resource?

This may or may not be the same as your Issue 4 above, and perhaps no one is arguing for “n denotes G” any more. At any rate I think it would be worth getting a formal resolution. For my vote, see Issue 4 above.

And, perhaps:

Issue 0: Should we say anything about the semantics of RDF datasets at all? (I know Peter's vote on that one.)

Best,
Richard
Received on Thursday, 6 September 2012 21:01:37 UTC