Re: Islands (ACTION-148) from Pat Hayes on 2012-02-29 (public-rdf-wg@w3.org from February 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 28 Feb 2012 21:06:06 -0600
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: public-rdf-wg@w3.org
Message-Id: <762E1833-238D-4491-B556-A9196372007A@ihmc.us>
On Feb 28, 2012, at 11:27 AM, Antoine Zimmermann wrote:

> 
> Le 27/02/2012 22:42, Pat Hayes a écrit :
>> 
>> On Feb 27, 2012, at 9:52 AM, Andy Seaborne wrote:
>> 
>>> In the telecon, I mentioned the idea of "islands".  This is not a
>>> technical design - its a way of thinking about the theory and
>>> practice of graphs on the web.
>>> 
>>> An island is a collection of graphs where all the RDF semantics
>>> (specifically for merge and for entailment relationships) work out
>>> as defined in the RDF 2004 specs.
>>> 
>>> That requires, for example, that the application trusts the
>>> information in all the graphs it's working with.
>> 
>> No. It does not require that. There are two distinct issues here:
>> what the truth conditions on a graph are, and whether or not you
>> should trust the RDF (or more correctly, whether or not you should
>> trust whoever is publishing it and claiming it to be true.) The RDF
>> semantic specs address the first of these but say nothing at all
>> about the second, other than that when you do accept some RDF, you
>> are kind of obliged to also accept its valid consequences (so
>> checking those is one way to determine, in fact, whether or not you
>> should trust the RDF.)
> 
> I agree. Trust is an orthogonal issue.

Good. So we are agreed, then, that the role of a semantics is to determine the truth conditions for RDF and RDF-related structures, right? Not (directly) to do with trust, acceptance, the quality or trustworthiness of the source of that RDF, etc..

>>> In practice, not all data is perfect.  An application will assemble
>>> a set of graphs it is going to work with - that may be some mixture
>>> of reading a number of places on the web, picking graphs out of a
>>> local graph store, and creating it's own data.  (from Yvres) RDF
>>> data about the Dr Who universe [1] is perfectly reasonable when
>>> working within that universe, but may be a bit suspect when
>>> considered in the real world.
>> 
>> Quite. And it would be great if we had a way to publish RDF 'in a
>> context' which made such relationships clearer. But this is an
>> aside.
> 
> Ok.
> 
>>> 
>>> The criteria is more "fit for purpose" - an application is going
>>> through two steps, one to collection the graphs it wants to work
>>> with together, the second to actually work with those graphs.
>>> 
>>> Islands aren't an absolute viewpoint and data may be come
>>> available, or an application may determine it trusts some new data,
>>> or even new island, and, for it's purpose, links them together.
>>> 
>>> Another application, with different goals, may take a different
>>> view as to whether two graphs can be considered to be compatible
>>> (an application specific term).  Foaf files declaring people's
>>> names may be good enough for a social network application, but not
>>> good enough for legal purposes.
>>> 
>>> For our named graphs discussions, the key technical requirement is
>>> to not combine data which shouldn't be.
>> 
>> OK for that (who can disagree?) but...
>> 
>>> Keeping data apart by default
>> 
>> ... not with that. That seems ridiculously strong.
> 
> I don't think so. Defaults don't prevent doing otherwise.

But they do determine what happens when nothing particularly is done.

> If you have a relational schema which states that column xyz is NULL by default, it does not forbid anyone to put data in that field.
> Similarly, noone is forbidding anybody to use the merge operation, just because we have Datasets.
> 
>>> and letting the application decide when to allow it to merge or
>>> entail.
>>> 
>>> [2] does that.
>> 
>> No, it does something even stronger. What [2] says is that *the same*
>> URI when used in one graph can mean something completely different
>> when used in another graph, and that *this is perfectly correct* and
>> even in fact *consistent*.
> 
> Yes, and from my point of view, it is fine.

I have no idea what you mean by "fine". The question at issue is, is it consistent? Because if it is, then it is not inconsistent. On your proposed semantics for datasets, this is a consistent dataset. There is no internal clash, nothing to resolve, no semantic problems here. 

> Obviously we can discuss it, but please make the discussion based on technical advantages or drawbacks, not on philosophical considerations. If it matches the use cases, then the group will have to admit it is pertinent. Certainly, it won't match the use cases perfectly, but what alternative do we have? You began to propose something, please formalise it completely on the wiki and we'll be able to decide what to do with a well informed eye.
> We (at least me) are willing to envisage other proposals as well.

I am not sure what you consider to be "philosophical". (By my lights, nothing in the entire WG activity is even remotely philosophical.) My point is very simple. Semantics is a theory of truth, and hence of ideas defined in terms of truth: satisfiability, entailment, etc.. Your proposed semantics seems to make pieces of RDF which, like the above example, seem to be to be obviously inconsistent, consistent. This strikes me as rather a strong criticism of it, considered as a theory of truth. In fact, so considered, it seems to me to be obviously wrong. 

You ask for use cases, but use cases are not to the point here. Nothing in the 2004 semantics (in fact in any model-theoretic semantics) determines what a processor must or must not *do*. It only determines semantic relationships between pieces of RDF. You can do whatever you like to any RDF in any datastore, and nothing in the semantics prohibits any of that activity. All it does is tell you when you have inconsistent sets of triples and which sets of triples entail other sets of triples. 

> 
>> What this means is that every URI in every
>> graph is interpreted locally to that graph, which in effect makes
>> every URI into a blank node (since this is how blank nodes are
>> interpreted.) This is dissolving the entire Web in a kind of
>> universal solvent.
> 
> No, URIs are not interpreted as existentials, their interpretation is simply parameterized.

Yes, you are right, I spoke too quickly. In fact treating them as bnodes would make more entailments possible, not fewer. There are *no* valid entailments between triples in different graphs in your semantics. 

> See the difference:
> 
> :g1 { :s :p _:b }
> 
> entails
> 
> g1 { :s :p _:c }

I was intending to refer to entailments between graphs with different labels, obviously. And then

g1 { :s :p _:b} 

does not entail

g2 { :s :p _:c }

because :s and :p might mean something different.

> 
> but
> 
> :g1 { :s :p :b }
> 
> does not entail
> 
> :g1 { :s :p :c }

> 
>>> Within one trig files, all the triples with the same 4th slot are
>>> in the same graph, and being one graph, all RDF semantics must be
>>> valid.
>> 
>> The RDF semantics does not refer to graphs, but to vocabularies. An
>> interpretation is a mapping FROM A VOCABULARY to a universe. Graphs
>> are mentioned only as conjunctions of triples.
> 
> The RDF semantics does refer to graphs.
> Especially, if E is an RDF graph and I is an RDF-interpretation, RDF 2004 defines I(E), which is commonly named "the interpretation of E": see Section 1.4 (http://www.w3.org/TR/rdf-mt/#gddenot).

Graphs HAVE interpretations, but an interpretion is not defined ON a graph, but on a vocabulary. It applies to any graph written using that vocabulary. (I am planning to modify the semantics so that every interpretation is defined on all URIs, by the way, eliminating talk of vocabularies altogether.) 

> 
>> The 2004 semantics
>> does not allow a given triple to mean different things depending upon
>> which graph it occurs in.
> 
> Really? Can you show where it says so precisely?

http://www.w3.org/TR/rdf-mt/#interp. The truth conditions on triples and graphs do not depend upon the graph. 

(BTW, this aspect of the semantics - that an interpretation is defined over a vocabulary and is not parameterized by the statements it applies to - is such a basic property that I never thought to emphasise it particularly. AFAIK, you are the first person who has ever found it non-obvious.)

> 
>>> Triples with different 4th slot may or may not be combinable.  The
>>> basic machinery does decide - it just means that two triples with
>>> two different 4th slots have no defined relationship.
>> 
>> Even if they are, for example, the same triple. Really, is this what
>> you want? Because we might as well just declare that RDF has no
>> semantics at all, seems to me. It no longer serves any purpose.
> 
> According to the RDF semantics
> 
> :s owl:differentFrom :s .
> 
> is consistent. Is it really what people want?

RDF is weak, of course, but it is designed to be a foundation for other, stronger, languages such as OWL, and in practice RDF often contains bits of RDFS or OWL.

> Simple entailment is very weak and makes *everything* consistent, but it is standard. Its weakness allows one to define constraints on vocabularies on top of it in various ways, where you can detect inconsistencies.

Yes, quite. And it is able to do this by defining some *minimal* truth conditions, among them being that URIs have fixed interpretations and graphs are conjunctions of their component triples, which in turn have a truthvalue which is a function of the referents of the URIs in the triple. So there are no 'contexts' to influence truth.

> The semantics in [2] is allowing everything to be consistent, provided that each "named" graph inside datasets is itself consistent.

I know it does that. What I dont yet understand is WHY you think this is a good idea. What purpose does this have? What does it make possible that cannot be done by referring to the 2004 semantics? 

> That is not a problem as extension can be defined that define how knowledge from one graph influence knowledge from another graph (e.g., we could have things like rdf:imports, or whatever).
> 
>>> The use of a URI for a graph label in two different trig documents
>>> should mean the same thing but combining two datasets, like
>>> combining two graphs, will involve an application deciding that is
>>> can be done.
>> 
>> But how will it? ANY two graphs are semantically consistent, on this
>> account, and two graphs (with different labels) NEVER entail any
>> graph larger than either of them (such as their merge, for example),
>> according to the semantics in [2].
> 
> If you refer to what graphs entail, you are certainly talking about the RDF semantics, which is completely defined in RDF 2004. The proposal in [2] does not say anything about what graphs entail. It talks about what datasets entail and mean.

Actually I do not follow how it determines entailment between a dataset and a graph, since it gives them different notions of "interpretation". Nor for that matter how it determines entailment relations between datasets, since an interpretation for a dataset with, say, three named graphs is a different structure from one for a dataset with, say, seven named graphs. 

> And it is not true that any two datasets are always mutually consistent.

How can they be inconsistent with one another? Inconsistent means that there is no interpretation which makes them both true. But we can always make one of them true and the other one false, since their local interpretations are completely independent.

> 
>> So all semantic relationships are
>> reduced to triviality, so there can be no criteria available to check
>> for acceptability on any semantic grounds.
> 
> Semantic relationships are as complex in datasets as they are in the logic used for individual graphs. If you want to simulate RDF reasoning with datasets, you simply deal with datasets that have empty "named" graphs and a default graph.

But I want to be able to talk about entailments from named graphs. There would be very little point in naming them if this act of naming cancels their meaning. And in any case, the *graph* is RDF, so already has a semantics. Giving it a name does not stop it being a graph. If I call my pet cat "smog", that does not stop her being a cat. Now, if you want to have graphs in a datastore *parameterized* by the "name", then OK, but then they are no longer RDF graphs, and these "names" are no longer just names. And the way to do that is to treat this "name" as a genuine parameter, ie as a third argument to the relations. 

> 
>> Remember, *every* URI
>> might mean sometjhing completely different in another graph, so you
>> can't say things like one graph says that x:joe is age 10 and the
>> other says he is age 12: that URI might refer to Joe in one graph and
>> Susan in the other, and the URI for the age property might mean age
>> in one graph and being-a-handle-of in the other. Graphs become black
>> holes of meaning, without any way for anything inside to influence or
>> connect with anything outside.
> 
> RDF does not offer any means to define unambiguously what a URI denotes.

Yes, yes yes, I know. Model theory does not determine real, social meaning. What it does do, however, is to specify how meanings of small pieces of syntax - URIs in this case - fix the meanings and specifically the truth of larger pieces of syntax - triples and graphs. And that specification does not envision having one and the same URI mean one thing in one place and something else in another place. This is not the same as saying that the intended meaning may be underdetermined - which is indeed almost always the case - but rather says that that there is such a thing as the intended meaning. Your semantics denies this basic property of language, that words are intended to have meanings. 

> It's all a question of personal interpretation. I provided an example a while ago using your personal web page's URL, and you said "this clearly denotes me", then I said, maybe it denotes your web page, and you said "oh, yes, of course, it denotes my web page".

Yes, and I was WRONG the first time.

> But frankly, there absolutely nothing in the RDF spec that can tell you for sure whether it's your web page or yourself or anything else.

True, and nobody ever said anything to the contrary. That is not the point. The point is that whatever it means, it is supposed to mean the same when it occurs here and when it occurs in a different website the other side othe planet two years from now. And the Web depends on this idea, or maybe it would be better to say contract, that we all have regarding URI meanings. Of course this gets ignored and abused and misused in practice, the world is not perfect, but it is still a normative goal and formal presumption of our specifications and their underlying models. 

> And this is true for *any* URI. So, the fact that they can be interpreted differently in different graphs seems to me quite natural.

Then, I submit, you havent grokked the basic nature of a public language. If English words meant something different depending on which document they appear in, how would we understand one another? How would we understand ourselves? 

Pat

> 
> 
> AZ
> 
>>> Islands aren't named or formally recognized - and one apps view of
>>> "usable together" may not be the same as another apps.
>> 
>> Oh what a tangled Web we weave.... (Sorry, couldnt resist :-)
>> 
>> Pat
>> 
>>> 
>>> Andy
>>> 
>>> [1] http://www.bbc.co.uk/doctorwho/dw [2]
>>> http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal
>>> 
>>> 
>> 
>> ------------------------------------------------------------ IHMC
>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>> (850)202 4416   office Pensacola                            (850)202
>> 4440   fax FL 32502                              (850)291 0667
>> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 83 36
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 29 February 2012 03:06:44 UTC