Re: dataset semantics from Antoine Zimmermann on 2011-12-21 (public-rdf-wg@w3.org from December 2011)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Wed, 21 Dec 2011 11:26:27 +0100
To: Pat Hayes <phayes@ihmc.us>
CC: Richard Cyganiak <richard@cyganiak.de>, public-rdf-wg@w3.org
Message-ID: <4EF1B453.4030801@emse.fr>
Le 21/12/2011 07:27, Pat Hayes a écrit :
>
>
 > [skip]
>
>> The meaning of an IRI is constrained by the triples in the graph in
>> which it occurs.
>
> That can be understood in two ways. One of them is correct, but
> irrelevant to the discussion here; the other is relevant, but then
> the claim is profoundly and dangerously wrong.
>
> The first sense is, that the meaning of an IRI is determined (perhaps
> in part) by what assertions are made using it, ie in RDF terms, by
> what RDF graphs it occurs in. Yes, I think that is basically correct,
> although its might be better to say, it is determined by the totality
> of all documents in which it occurs. [1] However, with this sense, we
> must take into account *all* the graphs in which the IRI occurs, or
> at any rate all those which we trust or accept. (I know this gets
> into the issue of how to adjudicate such trust, but let me leave that
> aside for now. It is orthogonal to the present point.) So when we
> look at two sources, both trusted, both using the IRI in question,
> then both of them constrain the meaning of the IRI. It is the same
> IRI in both (or perhaps many) graphs, not a different IRI in each
> separate graph.

To assign a trust measure, you need be able to identify the set of 
triples (or the source) that you trust, and for this you need to 
compartiment the triples in different boxes, some of which should not be 
influencing the knowledge of the rest.
But there may be boxes that you equally trust but that are in 
disagreement. In which case, what do you do? I say that you simply 
separate reasoning in different boxes, which is formalised by distinct 
interpretations of distinct "named" graphs.


> The second sense I can understand what you are saying here is exactly
> this idea, that one and the same IRI might have a meaning in one
> graph and a different meaning in a different graph. (Perhaps indeed,
> that it *must* have a different meaning in a different graph? Note,
> this is not the same as saying that one graph may be more trusted
> than the other.) Taken to an extreme, this amounts to the claim that
> each IRI has a whole spectrum of meanings, determined by the graph in
> which it appears, and hence that every occurrence of it in a
> different graph is, in effect, a distinct IRI.  It is difficult to
> emphasise the extent to which this idea is wrong. It would follow for
> example that RDF from two different graphs can never be combined to
> draw any conclusion that could not be derived from one of the graphs
> alone, that merging two graphs is never a valid operation, and many
> other consequences that seem to me to be somewhat insane. But in any
> case, even if we ignore RDF and its semantics, the whole Web is
> predicated on the basic idea of IRIs as *global* identifiers, which
> mean (in whatever sense of 'mean' one cares to adopt) the same thing,
> wherever they are used. (Of course, reality is often scruffier than
> the idealized design models described by our specifications, but at
> least the specifications make this basic assumption.)

After making a carefully thought proposal, being told that it is 
immensely wrong is somewhat painful.  You seem to pretend that the 
dataset proposal is changing the semantics and behaviour of RDF. Again 
(how many times should I repeat?), it does not change anything to RDF 
itself.  Everything that is valid in RDF 2004 is valid again. Since 
datasets were introduced in SPARQL, people haven't stopped merging RDF 
graphs, although they could always put different RDF graphs in different 
boxes.
But with this proposal, you can control what you merge and what you 
don't merge, in a standardised structure.


>> Go online, and look at what you find:
>>
>> http://www.emse.fr/~zimmermann/data4pat1.rdf
>>
>> This URL leads to a document where the
>> IRI<http://www.ihmc.us/groups/phayes/>  denotes the number 1.
>
> No. It leads to a document where the assertion is made that I, Pat
> hayes, am identical to the number 1. This assertion is, I am pleased
> to report, false. Nevertheless, that is what the document says. If,
> on the other hand, the UIR in question were interpreted as you say,
> then it would be true, but vacuous, since it would be asserting that
> 1=1.

What makes you think this URI identifies you?  If it was presented to me 
independently of the triples, I would have said it identifies a web 
page.  What makes you think it does not denote number 1?

What a URI denotes is a matter of opinion, and it's certainly decided by 
the one who publishes the triples (yes, it could be otherwise). But a 
system is not going to probe people's opinion on billions of URIs.  The 
only thing that a system can rely on is what is said in the triples. 
And the triples say it's number 1.


>>
>> Now, go to:
>>
>> http://www.emse.fr/~zimmermann/data4pat1.rdf
>>
>> In this document, the same IRI denotes number 2.
>
> Again, no. It still denotes me, as it did in the first graph, but
> this graph says that I am identical to the number 2. Taken together,
> these have the entailment (in OWL) that the number 1 equals the
> number 2. Which I hope we all agree is probably not the case;
> nevertheless, they do indeed entail that, taken together. Whereas, if
> that URI meant what you claim, these two graphs would have no
> inferential connection with one another at all, since
> the<http://www.ihmc.us/groups/phayes/>  in the first one would refer
> to something different from the<http://www.ihmc.us/groups/phayes/>
> in the second one.
>
>>
>> Eventually, a web crawler will index these two documents and
>> without context, it won't do anything useful.
>
> Hopefully, it might detect the inconsistency. I have no idea what
> help "context" would be. (Im not even sure what you mean by the word
> in this, er, context.)

It detects the inconsistency. Then what?
Forget about "context". The crawler puts the two documents in different 
"named graphs". Then, it depends. You can simply use SPARQL, with or 
without OWL inference regime, and get useful answers FROM a/some 
particular graph(s). If the crawler is Sindice's, reasoning over the 
first document will make a merge (yes, RDF merge) of what it gets from 
<http://www.ihmc.us/groups/phayes/> (which it may have crawled already 
and put in the appropriate "named" graph) and what it gets from 
<http://www.w3.org/2002/07/owl#sameAs>, and materialise inferences on 
that. The result is consistent, within its well delimited box. Same for 
the second document: it results in consistent inferences, inside the 
delimited box. Other reasoners may do it differently, but what is 
important is that different RDF graphs are interpreted differently.


>>
>> Then go get:
>>
>> http://www.emse.fr/~zimmermann/data4pat.rdf
>>
>> Now, this document says that all IRIs denote the same thing.
>
> It says that, indeed, and that is obviously false. It has a whole
> host of very silly entailments. I havnt checked, but I bet it is
> formally inconsistent, and that an OWL-Full reasoner would find a
> contradiction quite rapidly. (An OWL-DL reasoner will spit it out at
> parse time as illegal.). It is often the case that asserting
> something obviously false entails a great deal of other nonsense.
> So?

So the one giant graph composed of all triples is mostly wrong, 
contradictory, outdated, etc etc. People who want to do practical things 
with RDF don't want that a URI be defined by all documents that contain 
this URI. And it is not just a question of having sufficient trust. 
Sometime, contradictory documents are equally useful and trusted, and 
one doesn't want to hand pick the right and wrong documents/triples. It 
has to be done in a systematic way, and one needs a structure to keep 
these things in.

>
>> As, according to you, this thing is independent of the context, we
>> can stop making reasoners :)
>
> I can't even understand what this is supposed to mean, so I fail to
> follow your intended point.

I mean, if the Web of Linked Data is to be interpreted as a single giant 
graph, then reasoners are trivial to implement: all possible triples are 
entailled by this graph.

>
> Pat

Now, let us get back to the original point: the thread is called 
"dataset semantics". In this thread, we assume that we have the notion 
of dataset, so there is no need arguing against that here. The question 
is whether we want the WG to define a formal semantics for it.

Solution 1: we do not define a formal semantics. If this is what you 
advocate, then people can intepret URIs in a multitude of ways, which 
contradicts you arguments. So you are probably against this.

Solution 2: we propose a formal semantics. If we do that, the semantics 
must not contradict with widely deployed systems. We know that a common 
practice is to use the primary topic of an RDF document as the "name" of 
a "named" graph. We have to live with this. Another thing is that we 
know that merging all the "named" graphs in systems that grab data from 
all over the Web inevitably leads to an inconsistent graph. It is common 
that systems consider that the default graph entails all the "named" 
graphs (merge-as-default), but the opposite may also be true (default is 
universal truth). Considering these constraints, I made a proposal.
If you don't like it, what's YOUR proposal?


Best,
AZ.

>
> [1] However, this idea is by no means universally accepted. David
> Booth, for example, has argued at length that the meaning of any IRI
> should be determined by a single 'definitional' graph published by
> the owner of the IRI. Others have said that the meaning is determined
> by the intentions of the owner of the IRI, whether or not that
> intention is made manifest in any Web source. And there are many
> other positions out there.
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
> (850)202 4416   office Pensacola                            (850)202
> 4440   fax FL 32502                              (850)291 0667
> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 83 36
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 21 December 2011 10:26:32 UTC