Re: Islands (ACTION-148) from Antoine Zimmermann on 2012-02-29 (public-rdf-wg@w3.org from February 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Wed, 29 Feb 2012 11:48:30 +0100
To: Pat Hayes <phayes@ihmc.us>
CC: public-rdf-wg@w3.org
Message-ID: <4F4E027E.2050502@emse.fr>
Le 29/02/2012 04:06, Pat Hayes a écrit :
>
> On Feb 28, 2012, at 11:27 AM, Antoine Zimmermann wrote:
>
>>
>> Le 27/02/2012 22:42, Pat Hayes a écrit :
>>>
>>> On Feb 27, 2012, at 9:52 AM, Andy Seaborne wrote:
>>>
>>>> In the telecon, I mentioned the idea of "islands".  This is not
>>>> a technical design - its a way of thinking about the theory
>>>> and practice of graphs on the web.
>>>>
>>>> An island is a collection of graphs where all the RDF
>>>> semantics (specifically for merge and for entailment
>>>> relationships) work out as defined in the RDF 2004 specs.
>>>>
>>>> That requires, for example, that the application trusts the
>>>> information in all the graphs it's working with.
>>>
>>> No. It does not require that. There are two distinct issues
>>> here: what the truth conditions on a graph are, and whether or
>>> not you should trust the RDF (or more correctly, whether or not
>>> you should trust whoever is publishing it and claiming it to be
>>> true.) The RDF semantic specs address the first of these but say
>>> nothing at all about the second, other than that when you do
>>> accept some RDF, you are kind of obliged to also accept its valid
>>> consequences (so checking those is one way to determine, in fact,
>>> whether or not you should trust the RDF.)
>>
>> I agree. Trust is an orthogonal issue.
>
> Good. So we are agreed, then, that the role of a semantics is to
> determine the truth conditions for RDF and RDF-related structures,
> right? Not (directly) to do with trust, acceptance, the quality or
> trustworthiness of the source of that RDF, etc..

Yes, I agree with this.

>
>>>> In practice, not all data is perfect.  An application will
>>>> assemble a set of graphs it is going to work with - that may be
>>>> some mixture of reading a number of places on the web, picking
>>>> graphs out of a local graph store, and creating it's own data.
>>>> (from Yvres) RDF data about the Dr Who universe [1] is
>>>> perfectly reasonable when working within that universe, but may
>>>> be a bit suspect when considered in the real world.
>>>
>>> Quite. And it would be great if we had a way to publish RDF 'in
>>> a context' which made such relationships clearer. But this is an
>>> aside.
>>
>> Ok.
>>
>>>>
>>>> The criteria is more "fit for purpose" - an application is
>>>> going through two steps, one to collection the graphs it wants
>>>> to work with together, the second to actually work with those
>>>> graphs.
>>>>
>>>> Islands aren't an absolute viewpoint and data may be come
>>>> available, or an application may determine it trusts some new
>>>> data, or even new island, and, for it's purpose, links them
>>>> together.
>>>>
>>>> Another application, with different goals, may take a
>>>> different view as to whether two graphs can be considered to be
>>>> compatible (an application specific term).  Foaf files
>>>> declaring people's names may be good enough for a social
>>>> network application, but not good enough for legal purposes.
>>>>
>>>> For our named graphs discussions, the key technical requirement
>>>> is to not combine data which shouldn't be.
>>>
>>> OK for that (who can disagree?) but...
>>>
>>>> Keeping data apart by default
>>>
>>> ... not with that. That seems ridiculously strong.
>>
>> I don't think so. Defaults don't prevent doing otherwise.
>
> But they do determine what happens when nothing particularly is
> done.

Indeed.

>> If you have a relational schema which states that column xyz is
>> NULL by default, it does not forbid anyone to put data in that
>> field. Similarly, noone is forbidding anybody to use the merge
>> operation, just because we have Datasets.
>>
>>>> and letting the application decide when to allow it to merge
>>>> or entail.
>>>>
>>>> [2] does that.
>>>
>>> No, it does something even stronger. What [2] says is that *the
>>> same* URI when used in one graph can mean something completely
>>> different when used in another graph, and that *this is perfectly
>>> correct* and even in fact *consistent*.
>>
>> Yes, and from my point of view, it is fine.
>
> I have no idea what you mean by "fine". The question at issue is, is
> it consistent? Because if it is, then it is not inconsistent. On your
> proposed semantics for datasets, this is a consistent dataset. There
> is no internal clash, nothing to resolve, no semantic problems here.

Yes, I *do* mean that it is consistent, from a purely logical point of 
view. But again, this not less strict than simple entailment where all 
RDF graphs are necessarily consistent. So, is it so much a problem?

>> Obviously we can discuss it, but please make the discussion based
>> on technical advantages or drawbacks, not on philosophical
>> considerations. If it matches the use cases, then the group will
>> have to admit it is pertinent. Certainly, it won't match the use
>> cases perfectly, but what alternative do we have? You began to
>> propose something, please formalise it completely on the wiki and
>> we'll be able to decide what to do with a well informed eye. We (at
>> least me) are willing to envisage other proposals as well.
>
> I am not sure what you consider to be "philosophical". (By my lights,
> nothing in the entire WG activity is even remotely philosophical.)

Well, I mean, philosophical or based on personal feelings, or opinion, 
etc. When you say that defining a semantics with multiple RDF 
interpretations is wrong, this is personal opinion, *unless* you have 
concrete technical arguments that prove it leads to troublesome results.

To me, saying that URIs have a single universal interpretations is a bit 
like a philosophical statement. It *seems* to relate to the underlying 
assumption that there is but one single truth.

> My
> point is very simple. Semantics is a theory of truth, and hence of
> ideas defined in terms of truth: satisfiability, entailment, etc..
> Your proposed semantics seems to make pieces of RDF which, like the
> above example, seem to be to be obviously inconsistent, consistent.
> This strikes me as rather a strong criticism of it, considered as a
> theory of truth. In fact, so considered, it seems to me to be
> obviously wrong.

The semantics of datasets is equally timeless as RDF semantics in fact. 
An interpretation of a dataset assigned a unique resource to a 
parameterised URI (<u>,<g>), where <u> is the URI and <g> is the graph 
"label". It's still possible to do lots of useful entailments within the 
same graph or within the default graph, just as in RDF 2004. But 
additional, you can seggregate distinct graphs (such as "the graphs you 
do not agree with") and do reasoning over the things you endorse as well 
as the thing you do not endorse, and still preserve a miningful dataset.

>
> You ask for use cases, but use cases are not to the point here.
> Nothing in the 2004 semantics (in fact in any model-theoretic
> semantics) determines what a processor must or must not *do*. It only
> determines semantic relationships between pieces of RDF. You can do
> whatever you like to any RDF in any datastore, and nothing in the
> semantics prohibits any of that activity. All it does is tell you
> when you have inconsistent sets of triples and which sets of triples
> entail other sets of triples.

Sure, processors do what they want, but whenever they decide to expose 
their data to another system, they should do processing that are in 
agreement with how the external system is going to interpret the result 
of the processing. There are use cases where we want to ensure a certain 
understanding of the data. For instance, if I define a semantics of 
datasets where interpretations are simply RDF interpretations of the 
merge of the RDF graphs, then the use cases related to time, to 
endorsement, to provenance, are certainly not met.

>
>>
>>> What this means is that every URI in every graph is interpreted
>>> locally to that graph, which in effect makes every URI into a
>>> blank node (since this is how blank nodes are interpreted.) This
>>> is dissolving the entire Web in a kind of universal solvent.
>>
>> No, URIs are not interpreted as existentials, their interpretation
>> is simply parameterized.
>
> Yes, you are right, I spoke too quickly. In fact treating them as
> bnodes would make more entailments possible, not fewer. There are
> *no* valid entailments between triples in different graphs in your
> semantics.

Yes. In your semantics, are there entailments between:

:s :p :o :param1 .

and

:s :p :o :param2 .

?

If yes, then you have not presented all the details, as it seems it's 
not the case from your previous emails.

>
>> See the difference:
>>
>> :g1 { :s :p _:b }
>>
>> entails
>>
>> g1 { :s :p _:c }
>
> I was intending to refer to entailments between graphs with different
> labels, obviously. And then
>
> g1 { :s :p _:b}
>
> does not entail
>
> g2 { :s :p _:c }
>
> because :s and :p might mean something different.

Sure, even  :g1 {:s :p :o} does not entail :g2 {:s :p :o}. This is not a 
bug, it's a feature. (but then define an extension of the semantics 
where you can express interactions between graphs, and you get whatever 
behaviour we may want to standardise).
The point is to say that all desired semantics for datasets eventually 
boil down to extensions of the simple and liberal semantics of [2].

>
>>
>> but
>>
>> :g1 { :s :p :b }
>>
>> does not entail
>>
>> :g1 { :s :p :c }
>
>>
>>>> Within one trig files, all the triples with the same 4th slot
>>>> are in the same graph, and being one graph, all RDF semantics
>>>> must be valid.
>>>
>>> The RDF semantics does not refer to graphs, but to vocabularies.
>>> An interpretation is a mapping FROM A VOCABULARY to a universe.
>>> Graphs are mentioned only as conjunctions of triples.
>>
>> The RDF semantics does refer to graphs. Especially, if E is an RDF
>> graph and I is an RDF-interpretation, RDF 2004 defines I(E), which
>> is commonly named "the interpretation of E": see Section 1.4
>> (http://www.w3.org/TR/rdf-mt/#gddenot).
>
> Graphs HAVE interpretations, but an interpretion is not defined ON a
> graph, but on a vocabulary. It applies to any graph written using
> that vocabulary. (I am planning to modify the semantics so that every
> interpretation is defined on all URIs, by the way, eliminating talk
> of vocabularies altogether.)

Sure, but you said that RDF semantics does not refer to graphs. I just 
corrected.

>>
>>> The 2004 semantics does not allow a given triple to mean
>>> different things depending upon which graph it occurs in.
>>
>> Really? Can you show where it says so precisely?
>
> http://www.w3.org/TR/rdf-mt/#interp. The truth conditions on triples
> and graphs do not depend upon the graph.
>
> (BTW, this aspect of the semantics - that an interpretation is
> defined over a vocabulary and is not parameterized by the statements
> it applies to - is such a basic property that I never thought to
> emphasise it particularly. AFAIK, you are the first person who has
> ever found it non-obvious.)

Here, you implicitly say "Given an interpretation I on a vocabulary V, 
the interpretation of a URI, no matter what the graph is, is the same.

But you are assuming a single interpretation I. It's your 
interpretation. In my interpretation I', the URIs have a different 
interpretations. And nothing tells me in the semantics that vocabularies 
can only have a single interpretation.

A vocabulary has many (infinitely many) interpretations, so a triple, or 
a graph, have several interpretations. If I take two graphs, there are 
no reason why I should interpret them according to the same 
interpretations all the time, everywhere. I don't see anything in the 
spec that tell me otherwise.

>>
>>>> Triples with different 4th slot may or may not be combinable.
>>>> The basic machinery does decide - it just means that two
>>>> triples with two different 4th slots have no defined
>>>> relationship.
>>>
>>> Even if they are, for example, the same triple. Really, is this
>>> what you want? Because we might as well just declare that RDF has
>>> no semantics at all, seems to me. It no longer serves any
>>> purpose.
>>
>> According to the RDF semantics
>>
>> :s owl:differentFrom :s .
>>
>> is consistent. Is it really what people want?
>
> RDF is weak, of course, but it is designed to be a foundation for
> other, stronger, languages such as OWL, and in practice RDF often
> contains bits of RDFS or OWL.

Why not define a weak, liberal semantics for datasets that is designed 
to be a foundation for other, stronger languages such as "temporal 
datasets", "versioned datasets", "trust-based datasets", etc.?
>
>> Simple entailment is very weak and makes *everything* consistent,
>> but it is standard. Its weakness allows one to define constraints
>> on vocabularies on top of it in various ways, where you can detect
>> inconsistencies.
>
> Yes, quite. And it is able to do this by defining some *minimal*
> truth conditions, among them being that URIs have fixed
> interpretations and graphs are conjunctions of their component
> triples, which in turn have a truthvalue which is a function of the
> referents of the URIs in the triple. So there are no 'contexts' to
> influence truth.

Right, but it seems that the use cases and practices show that we 
*need*, somehow, context. Since context is not in the standard, people 
have made ad hoc implementations that provide context, often assuming 
implicitly a certain semantics, but rarely concerted with other parties. 
Let us define a solid ground for these systems (some of which are doing 
inferences on datasets).

>> The semantics in [2] is allowing everything to be consistent,
>> provided that each "named" graph inside datasets is itself
>> consistent.
>
> I know it does that. What I dont yet understand is WHY you think this
> is a good idea. What purpose does this have? What does it make
> possible that cannot be done by referring to the 2004 semantics?

The 2004 semantics does not tell me what are the logical consequences of 
a dataset. It defines the logical consequences of a set of triples, not 
of a set of pairs. So how should I do reasoning over a dataset?
Besides, how should I do to define temporal datasets which are 
meaningful wrt the semantics? How do I make inferrence if I have things 
from multiple provenance? What are the "graph names" referring to? All 
these questions (and perhaps more) could be answered by a clean 
semantics (be it the one of [2] or another).

>> That is not a problem as extension can be defined that define how
>> knowledge from one graph influence knowledge from another graph
>> (e.g., we could have things like rdf:imports, or whatever).
>>
>>>> The use of a URI for a graph label in two different trig
>>>> documents should mean the same thing but combining two
>>>> datasets, like combining two graphs, will involve an
>>>> application deciding that is can be done.
>>>
>>> But how will it? ANY two graphs are semantically consistent, on
>>> this account, and two graphs (with different labels) NEVER entail
>>> any graph larger than either of them (such as their merge, for
>>> example), according to the semantics in [2].
>>
>> If you refer to what graphs entail, you are certainly talking about
>> the RDF semantics, which is completely defined in RDF 2004. The
>> proposal in [2] does not say anything about what graphs entail. It
>> talks about what datasets entail and mean.
>
> Actually I do not follow how it determines entailment between a
> dataset and a graph, since it gives them different notions of
> "interpretation".

It determines entailment between datasets, not between a dataset and a 
graph.

> Nor for that matter how it determines entailment
> relations between datasets, since an interpretation for a dataset
> with, say, three named graphs is a different structure from one for a
> dataset with, say, seven named graphs.

A dataset with 3 "named" graphs would never entail a dataset with 7 
"named" graph in the current proposal (again, we can discuss and update 
and amend all the technical details if we see a good reason for it). 
However, the current version allows a dataset with 7 "named" graphs to 
infer a dataset with 3 "named" graphs.
To be honest, I am not yet completely satisfied by the current 
formulation, but at least we started to discuss it at the technical level.
Again, please propose alternatives.

>> And it is not true that any two datasets are always mutually
>> consistent.
>
> How can they be inconsistent with one another? Inconsistent means
> that there is no interpretation which makes them both true. But we
> can always make one of them true and the other one false, since their
> local interpretations are completely independent.

To have contradicting datasets, you'd need to replace "RDF model" in [2] 
with "RDFS model" (or any stronger semantics). In this case, take the 
following example:

First dataset:

  :g1 { :s :p "abc" }

Second dataset :

  :g1 { :p rdfs:range rdf:XMLLiteral }

There are no Dataset-model of the first dataset which satisfies the 
second dataset.

I'd be happier if we replace "RDF model" by "X model" where X is a meta 
variable denoting an inferrence regime.

>>> So all semantic relationships are reduced to triviality, so there
>>> can be no criteria available to check for acceptability on any
>>> semantic grounds.
>>
>> Semantic relationships are as complex in datasets as they are in
>> the logic used for individual graphs. If you want to simulate RDF
>> reasoning with datasets, you simply deal with datasets that have
>> empty "named" graphs and a default graph.
>
> But I want to be able to talk about entailments from named graphs.
> There would be very little point in naming them if this act of naming
> cancels their meaning. And in any case, the *graph* is RDF, so
> already has a semantics. Giving it a name does not stop it being a
> graph. If I call my pet cat "smog", that does not stop her being a
> cat. Now, if you want to have graphs in a datastore *parameterized*
> by the "name", then OK, but then they are no longer RDF graphs, and
> these "names" are no longer just names. And the way to do that is to
> treat this "name" as a genuine parameter, ie as a third argument to
> the relations.

Which amounts to the same thing. If you don't like the fact that there 
are several RDF interpretations inside a Dataset interpretation, you can 
just treat it as a single interpretation over parameterised URIs.

Or, you can propose a stricter version like the one you started to 
describe, with ternary IEXT. But you can equivalently describe the 
ternary IEXT as a set of RDF interpretations with different IEXT, but 
equal domains and URI interpretations.

>>
>>> Remember, *every* URI might mean sometjhing completely different
>>> in another graph, so you can't say things like one graph says
>>> that x:joe is age 10 and the other says he is age 12: that URI
>>> might refer to Joe in one graph and Susan in the other, and the
>>> URI for the age property might mean age in one graph and
>>> being-a-handle-of in the other. Graphs become black holes of
>>> meaning, without any way for anything inside to influence or
>>> connect with anything outside.
>>
>> RDF does not offer any means to define unambiguously what a URI
>> denotes.
>
> Yes, yes yes, I know. Model theory does not determine real, social
> meaning. What it does do, however, is to specify how meanings of
> small pieces of syntax - URIs in this case - fix the meanings and
> specifically the truth of larger pieces of syntax - triples and
> graphs. And that specification does not envision having one and the
> same URI mean one thing in one place and something else in another
> place. This is not the same as saying that the intended meaning may
> be underdetermined - which is indeed almost always the case - but
> rather says that that there is such a thing as the intended meaning.
> Your semantics denies this basic property of language, that words are
> intended to have meanings.
>
>> It's all a question of personal interpretation. I provided an
>> example a while ago using your personal web page's URL, and you
>> said "this clearly denotes me", then I said, maybe it denotes your
>> web page, and you said "oh, yes, of course, it denotes my web
>> page".
>
> Yes, and I was WRONG the first time.
>
>> But frankly, there absolutely nothing in the RDF spec that can tell
>> you for sure whether it's your web page or yourself or anything
>> else.
>
> True, and nobody ever said anything to the contrary. That is not the
> point. The point is that whatever it means, it is supposed to mean
> the same when it occurs here and when it occurs in a different
> website the other side othe planet two years from now. And the Web
> depends on this idea, or maybe it would be better to say contract,
> that we all have regarding URI meanings. Of course this gets ignored
> and abused and misused in practice, the world is not perfect, but it
> is still a normative goal and formal presumption of our
> specifications and their underlying models.

I don't think that the Web is based on the fact that a URI or a URL must 
mean the same everywhere all the time for everybody. Consider URL: the 
same URL will lead to very different documents, not only over time, but 
also across places and clients and configurations.
What is universal in URL is the process which will eventually lead to 
the resource. We could say that URIs are indeed universally shared 
names, and that their interpretation is defined by the same, universal 
definition (a tuple <IR,IP,IS,IL,IEXT>) but the actual interpretation at 
a point in time, for a certain person in a certain place *may* differ 
from the interpretation in a different context.

>
>> And this is true for *any* URI. So, the fact that they can be
>> interpreted differently in different graphs seems to me quite
>> natural.
>
> Then, I submit, you havent grokked the basic nature of a public
> language. If English words meant something different depending on
> which document they appear in, how would we understand one another?

They do. Graph does not mean a set of triples in almost any English 
documents.

> How would we understand ourselves?

Becasue we are able to relate the words to their context, maybe?

>
> Pat
>
>>
>>
>> AZ
>>
>>>> Islands aren't named or formally recognized - and one apps view
>>>> of "usable together" may not be the same as another apps.
>>>
>>> Oh what a tangled Web we weave.... (Sorry, couldnt resist :-)
>>>
>>> Pat
>>>
>>>>
>>>> Andy
>>>>
>>>> [1] http://www.bbc.co.uk/doctorwho/dw [2]
>>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal
>>>>
>>>>
>>>
>>>
>>>>
------------------------------------------------------------ IHMC
>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St. (850)202 4416
>>> office Pensacola                            (850)202 4440   fax
>>> FL 32502                              (850)291 0667 mobile
>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 83 36
>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>
>>
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
> (850)202 4416   office Pensacola                            (850)202
> 4440   fax FL 32502                              (850)291 0667
> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 83 36
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Wednesday, 29 February 2012 10:49:00 UTC