Re: Really minimal dataset semantics from Antoine Zimmermann on 2012-09-20 (public-rdf-wg@w3.org from September 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Thu, 20 Sep 2012 11:49:44 +0200
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
CC: Pat Hayes <phayes@ihmc.us>, RDF WG <public-rdf-wg@w3.org>
Message-ID: <505AE6B8.7080804@emse.fr>
Le 19/09/2012 22:12, Peter F. Patel-Schneider a écrit :
> Having the default graph simply be the merge of the named graphs is only
> one possibility.  There are many others, including the one raised by Pat
> in the TC today, where the default graph is extracted from the named
> graphs and excludes questionable stuff in them.   I thus don't see that
> an argument based solely on this use of datasets holds water.
>
> I still have problems with inconsistency in the named graph making the
> dataset inconsistent.  For example, I may just want to record multiple
> graphs and the default graph may just be the first one, sort of a first
> among equals.

Of course there are all sorts of imaginable usage of the default graph, 
but it's just hypothetical and it's not necessarily "right". There are 
imaginable cases where the possibility of inconsistent RDF graphs is not 
desirable, as well as undesired inconsistent FOL theory. There are 
workarounds, e.g., introducing paraconsistent logics or other such 
things, but at the core, all this ultimately relies on a well behaved, 
monotonic, standard logic that is used as the basis for the workarounds.
Reasoners that deal with Web data are usually incomplete, and that's 
perfectly fine, I think, as far as the spec are concerned. Many other 
cases exist where the semantics is not applied either completely or 
correctly.

But it's ok as long as it is an internal practice, and as long as one 
does not assume that data coming from other sources follow the same 
practices. The semantics is just documenting what can be assumed to be 
correct in all cases, then you are free to develop a provably correct 
reasoner, or just a parser that does not care at all for the semantics, 
or a reasoner that only makes inferences based on statistical 
distribution of certain patterns, or whatever.

Put in other words, do your applications the way you want but do not 
assume that a dataset has the semantics given by your application, when 
you're communicating datasets.


> I also have problems with any dataset semantics that isn't based on the
> actual form of the named graphs.

I don't understand this.

>   Isn't a major use of datasets
> supposed to be for associating a graph with its source?

Sure. This type of use case is compatible with the semantics.


>  if so, neither
> of these two semantics seems to be correct, as the meaning of a named
> graph in a dataset is not a graph but is instead something like an
> equivalence class of graphs.

I don't understand. Where do you see equivalence classes?


>  The semantics then destroys the
> relationship between the name and the actual graph.

It does not destroy anything because a semantics does not *do* anything. 
The actual relationship between the name and the actual graph is written 
in the dataset.
You could argue along these lines saying that the RDF semantics destroys 
the relationship between the property names and the actual pairs 
(subject,object) that are actually in the graph.
If you want to know what subjects or objects occur with a predicate 
inside a graph, just look at the graph. There are APIs for this.
Same for datasets.


  (Of course, you
> could always just ignore the semantics and directly use the graph from
> the dataset, but then what is the point of having the named graph there?)

The data structure is also very important, just as in RDF graphs, the 
data structure is already a nice way of organising the data, linking 
data together, etc. Semantics does not have to come into play where it 
has no role.


--AZ

>
> peter
>
> PS: Sorry for not explicitly bringing up this last point before.
>
>
> On 09/19/2012 12:58 PM, Antoine Zimmermann wrote:
>> Pat and Peter,
>>
>>
>> The so called minimal dataset semantics proposed in the wiki is not
>> minimal, and this non-minimality causes the issues mentionned by Peter.
>> The tricky entailment of
>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/Minimal-dataset-semantics#Brain_twisters
>>
>> is an example of the trouble caused by the constraint that IRIs
>> denoting the same resource must map to a single graph.
>>
>> It is not the proposal I initial wrote, and I support another
>> proposal. My initial more-minimal proposal was:
>> http://www.w3.org/2011/rdf-wg/wiki/index.php?title=TF-Graphs/Minimal-dataset-semantics&oldid=2438
>>
>>
>> Entailment is independent in the default graph and in the named graphs
>> as long as the default graph is consistent.
>>
>> This addresses one of Peter's comment.
>>
>> Now, let us examine the situation wrt the consistency of the default
>> graph. The argument comes from an implementation practice in SPARQL
>> store. If we approve the Proposal made by Sandro, then such a practice
>> is irrelevant to us for the definition of dataset semantics.
>>
>> When you want to exchange a dataset between systems, I doubt it would
>> be a good idea to serialise the merge of all the graphs, in addition
>> to serialising all the named graphs themselves. That would be insanely
>> redundent. So, in the end, what gets exposed, and what has to be
>> interoperable, does not (or should bot, if that's what people do)
>> contain an unrestricted, unclean garbage of triples inside the default
>> graph.
>>
>> I doubt that people are going to write TriG files where all the
>> triples get duplicate because inside the implementation there was the
>> "default as union" policy.
>>
>> Isn't this TriG file ridiculous?  I wish it was inconsistent.
>>
>> { :s owl:differentFrom :s .
>>   # 10000 other triples from :g1
>>   rdf:type owl:sameAs owl:sameAs .
>>   # 10000 other triples from :g2 }
>> :g1 { :s owl:differentFrom :s .
>>   # 10000 other triples from :g1 }
>> :g2 { :rdf:type owl:sameAs owl:sameAs .
>>   # 10000 other triples from :g2 }
>>
>>
>> The more-minimal dataset semantics is not clashing with SPARQL. Before
>> SPARQL 1.1 Entailment Regimes, there was no way to query an
>> inconsistent graph with the default regime. With SPARQL 1.1, it is
>> possible to sublit a query to an inconsistent graph, which may
>> generate an error. But at any moment of the query resolution, only the
>> semantics of graphs are used. The query engine has to resolve Basic
>> Graph Pattern matching according to the graph-entailment regime, and
>> build the complete answers to the query according to SPARQL algebra.
>> Dataset-semantics is never involved.
>>
>> So, according to this view, I'm interested in knowing if there are
>> still counter arguments from Pat or Peter.
>> If there are, would the objection be a firm "-1" or something else?
>>
>>
>> Best,
>
>


-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 83 36
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Thursday, 20 September 2012 09:49:57 UTC