Re: Islands (ACTION-148) from Pat Hayes on 2012-02-27 (public-rdf-wg@w3.org from February 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 27 Feb 2012 15:42:27 -0600
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: RDF-WG <public-rdf-wg@w3.org>
Message-Id: <BA02EBA2-C552-4FCD-900C-839205C71B29@ihmc.us>
On Feb 27, 2012, at 9:52 AM, Andy Seaborne wrote:

> In the telecon, I mentioned the idea of "islands".  This is not a technical design - its a way of thinking about the theory and practice of graphs on the web.
> 
> An island is a collection of graphs where all the RDF semantics (specifically for merge and for entailment relationships) work out as defined in the RDF 2004 specs.
> 
> That requires, for example, that the application trusts the information in all the graphs it's working with.

No. It does not require that. There are two distinct issues here: what the truth conditions on a graph are, and whether or not you should trust the RDF (or more correctly, whether or not you should trust whoever is publishing it and claiming it to be true.) The RDF semantic specs address the first of these but say nothing at all about the second, other than that when you do accept some RDF, you are kind of obliged to also accept its valid consequences (so checking those is one way to determine, in fact, whether or not you should trust the RDF.) 

> 
> In practice, not all data is perfect.  An application will assemble a set of graphs it is going to work with - that may be some mixture of reading a number of places on the web, picking graphs out of a local graph store, and creating it's own data.  (from Yvres) RDF data about the Dr Who universe [1] is perfectly reasonable when working within that universe, but may be a bit suspect when considered in the real world.

Quite. And it would be great if we had a way to publish RDF 'in a context' which made such relationships clearer. But this is an aside. 

> 
> The criteria is more "fit for purpose" - an application is going through two steps, one to collection the graphs it wants to work with together, the second to actually work with those graphs.
> 
> Islands aren't an absolute viewpoint and data may be come available, or an application may determine it trusts some new data, or even new island, and, for it's purpose, links them together.
> 
> Another application, with different goals, may take a different view as to whether two graphs can be considered to be compatible (an application specific term).  Foaf files declaring people's names may be good enough for a social network application, but not good enough for legal purposes.
> 
> For our named graphs discussions, the key technical requirement is to not combine data which shouldn't be.

OK for that (who can disagree?) but...

>  Keeping data apart by default

... not with that. That seems ridiculously strong. 

> and letting the application decide when to allow it to merge or entail.
> 
> [2] does that.

No, it does something even stronger. What [2] says is that *the same* URI when used in one graph can mean something completely different when used in another graph, and that *this is perfectly correct* and even in fact *consistent*. What this means is that every URI in every graph is interpreted locally to that graph, which in effect makes every URI into a blank node (since this is how blank nodes are interpreted.) This is dissolving the entire Web in a kind of universal solvent. 

>  Within one trig files, all the triples with the same 4th slot are in the same graph, and being one graph, all RDF semantics must be valid.

The RDF semantics does not refer to graphs, but to vocabularies. An interpretation is a mapping FROM A VOCABULARY to a universe. Graphs are mentioned only as conjunctions of triples. The 2004 semantics does not allow a given triple to mean different things depending upon which graph it occurs in. 

>  Triples with different 4th slot may or may not be combinable.  The basic machinery does decide - it just means that two triples with two different 4th slots have no defined relationship.

Even if they are, for example, the same triple. Really, is this what you want? Because we might as well just declare that RDF has no semantics at all, seems to me. It no longer serves any purpose.

> 
> The use of a URI for a graph label in two different trig documents should mean the same thing but combining two datasets, like combining two graphs, will involve an application deciding that is can be done.

But how will it? ANY two graphs are semantically consistent, on this account, and two graphs (with different labels) NEVER entail any graph larger than either of them (such as their merge, for example), according to the semantics in [2]. So all semantic relationships are reduced to triviality, so there can be no criteria available to check for acceptability on any semantic grounds. Remember, *every* URI might mean sometjhing completely different in another graph, so you can't say things like one graph says that x:joe is age 10 and the other says he is age 12: that URI might refer to Joe in one graph and Susan in the other, and the URI for the age property might mean age in one graph and being-a-handle-of in the other. Graphs become black holes of meaning, without any way for anything inside to influence or connect with anything outside. 

> 
> Islands aren't named or formally recognized - and one apps view of "usable together" may not be the same as another apps.

Oh what a tangled Web we weave.... (Sorry, couldnt resist :-)

Pat

> 
> 	Andy
> 
> [1] http://www.bbc.co.uk/doctorwho/dw
> [2] http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/RDF-Datasets-Proposal
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 27 February 2012 21:43:05 UTC