Re: RDF Semantics - Intuitive summary needs to be scoped to interpretations (ISSUE-149) from Pat Hayes on 2013-10-31 (www-archive@w3.org from October 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 31 Oct 2013 01:32:53 -0500
To: David Booth <david@dbooth.org>
Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, www-archive <www-archive@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Ivan Herman <ivan@w3.org>, Sandro Hawke <sandro@w3.org>
Message-Id: <D7FB59D1-4436-4784-A5F7-16E6F01A40F6@ihmc.us>
Hi David

Rather than respond point-by-point, I will again try to summarize. However, there are a few responses that are needed first:

> ... at least in principle, anything that can be described in, say, English prose could instead be described in RDF.

Most emphatically, no. Even if you substitute the most expressive formal logic available (say, full higher-order modal tense logic) , this would not be even remotely correct. RDF is so inexpressive that it cannot manage something as simple as "Fathers are not mothers."  OWL cannot define the idea of an uncle, and full first-order logic cannot define the idea of a natural number. 

> But AFAICT, the trend is inevitably *toward* mismatch as more statements are published, assuming that: (a) parties publish data independently (without knowledge of each other)

But why would you assume this? Usually, if B is publishing data using A's IRIs (the only case that is of interest here) then B will have access to *some* published information which will help determine what A's intentions were regarding A's intended meaning. For example, if you use DBpedia IRIs then there are large pages of information available, in multiple languages. The entire Semantic Web/linked-data enterprise is predicated on the idea that IRIs both denote entities and also provide links to sources of more information about those entities (or, if you like, more information about what the IRI is intended to denote.) So the idea of two RDF authors using the same IRI but without any knowledge of what the owner of the IRI intended it to refer to, is SW/LD-pathological.

>   If the problem is disagreement then yes, you would have to choose between the source graphs.  But if the problem is divergence then you have to do some more work -- resource identity splitting -- but can still use both source graphs after splitting.

Changing the IRIs in a graph gives you a different graph. So you would not be using both source graphs, but some modification of the source graphs. And you would be obliged to *not* use - that is, to reject - at least one of the source graphs, when they are mutually inconsistent. 

>  ... RDF data does not generally describe the real world, it describes a particular *conceptualization* of the real world

It describes the world *using* a conceptualization (is there any other way to describe anything?). It does not (usually) describe the conceptualization.

>  false graphs aren't very useful, because they entail everything

Just as a technical point: *logically false* graphs – contradictions, false in *every* interpretation – entail everything. Mere falsity does not get you quodlibet. 

---------

There are two substantive points of disagreement between us, and one complete mismatch (divergence?) where I fail to understand what you are saying. Let me deal with the two points first. 

1.  The reality thesis: that the real world is one of the satisfying interpretations, and data is (usually) about the real world. I find this obvious, so obvious indeed that it should not even need to be said. You apparently find it either mistaken or meaningless, and in any case think it is misleading as a guide to intuition. I am not sure how to persuade you to my way of thinking, but let me ask you: if all this linked data is not about reality, what do you think it *is* about? And why do we find it useful, if it does not provide us with information about the actual world? Are the records of the transactions in your bank account about your actual wealth? Would that change if the bank started using RDF?

Your objections to the idea include the observation cited above about conceptualizations. Yes, of course data is stated *using* a conceptualization, just like all assertions in every language or formalism. But that does not make it any less about reality. It really is a fact about the real world that Hilary and Tensing climbed Everest in 1953; that we conceptualize the world here in terms of people and mountains does not make this any less true. I am not sure what the point of your "toucan" example is, but apparently the real world can satisfy both the bird assertions and the website assertions, by appropriate choice of an interpretation mapping. (If the complete set of assertions is inconsistent then of course nothing can satisfy it.) Your third point concerned approximations and idealizations, such as the flat-earth geography of road maps. But examples like this do not argue against the reality thesis. An approximate or idealized description of X is still a description of X. Bear in mind that if some RDF can be satisfied by an approximation or simplification of the real world, then it can also be satisfied by the more complicated real world, since one can add (an infinite amount of) structure to an interpretation freely without making any RDF triples false. (This is a consequence of RDF being a positive logic without negation.) The map example is quite instructive, as quite a lot of geolocation information (eg lat/long coordinates) is in fact describing spherical space rather than flat space, even though we project it onto flat surfaces. 

To say that some assertion is about the real world, or that it is factual, is not to claim that it is in some metaphysical sense the final truth or the definitive description, or that it is the last word, or that its truth has to have ended science. It is just saying that it is true.

You say: 
> ... The "real world" interpretation is largely irrelevant -- both to the formal semantics and to understanding how the Semantic Web *actually* works.

I strongly disagree. Many IRIs have fixed interpretations in the actual world, determined by all kinds of social, technical and linguistic conventions and meanings entirely outside RDF. We still want to be able to use RDF to describe these referents. For example, I am a consultant on a project (http://www.imagesnippets.com/) to add RDF markup to images. These RDF descriptions use IRIs which identify (and in the RDF refer to) images, regions in the images, people and places and colors and objects described in DBpedia and many other real (no scare quotes) things in the real world. None of these denotation mappings are specified by RDF descriptions, and most of them could not be. Most – I would claim, virtually all – RDF linked data uses IRIs like this to refer to real things. It is centrally important that the formal semantics works with such identifying IRIs. 

'Edmund Hilary climbed Everest in 1953' says something true about the actual, real, world. It expresses a fact. Just a mundane, simple bit of data. So, how is this factuality of this fact related to model-theory semantics? By the actual, real, world being one of the satisfying interpretations of it. Because if the real world was not a satisfying interpretation of this sentence, then it *couldn't possibly* be true (in the real world.) 

But we can, if you like, simply agree to disagree about this, as it has no direct bearing on the basic point we have been arguing about, which is...  

2.  The idea of an IRI denoting something "in a graph".  Your gloss on this phase, as I now understand it from your email (the first time you have explained your intended meaning) is as follows: you take all the interpretations which satisfy the graph (and there will be different such sets for different graphs, of course) and then you ask, what does the IRI denote in those interpretations? And that is what the IRI denotes "in the graph". (Do I have that right?)

But that does not define anything, because for any consistent graph G, and any IRI U in that graph, there are interpretations which satisfy G and in which U denotes things different from what it denotes in other interpretations satisfying G. There is no graph which 'pins  down' the interpretations of the URIs which occur in it in the way that your definition requires.  (Here is a simple proof. Let x be something which is not an IRI. The interpretation I with universe {x} and IEXT(x)={<x,x>} and I(u)=x for every URI u, satisfies G. The Herbrand interpretation H of G also satisfies G. But H(U) = U =/= x = I(U), by construction. QED.) In fact, one can make a stronger statement: truth in an interpretation does not depend on the identity of the referents of IRIs *at all*, because one can take *any* satisfying interpretation and produce another isomorphic one with the identities permuted in any way one likes, as long as the IEXT mappings are permuted to match. (In fact, this applies to *any* axiomatizable, complete formal logic, no matter how expressive.) In a nutshell: model theory does not determine reference. 

This should not be too surprising, actually, if you think about how model theory is defined. The very definition of interpretation presumes complete referential freedom: any IRI can denote anything. And truth is determined solely by how those things stand in relations to one another. The entire apparatus of model theory makes no reference to the *actual identity* of the things in the universe being described. So creating real constraints on reference - attaching, as it were, a name to a thing - has to be done by other means. In practice, we rely on notions of naming and reference already in use in the larger world (as I did when using "Everest" to refer to the highest mountain, and how ImageSnippets does when using 'http://schema.org/Person' to refer to the class of human beings) and sometimes on predefined mappings (as we do when fixing the referents of literals using datatypes) and perhaps even by ostention (arguably, http-range-14 can be seen as declaring HTTP GET/200 to be a form of ostention.) And this all works quite nicely (a lot of the time) because we can all (more or less) agree on what these referring names actually refer to, at least well enough to transfer meanings successfully by using them as referring names in sentences. 

So, as I believe I have said several times, phrases such as "interpretation of an IRI in a graph" are not meaningful. It is not that this is a different perspective on model theory, or an alternative viewpoint. It is that it, quite literally, does not mean anything. 

---------

Now the place where I fail to understand what it is you are saying. 

At the end of your email you list all the advantages of an "other way" of approaching model theory. But as far as I can tell, this "alternative" is simply standard model theory. For example:

> The other way to think of the RDF Semantics is in terms of *multiple* interpretations

This is the only correct way. As I have said to you before, *of course* we think in terms of multiple interpretations. That is the entire point of defining the notion of interpretation. The very definition of entailment refers to multiple interpretations. 

> , instead of attempting to assume or impose a single "real world" interpretation.

Well, it is fine to assume that the real world is *one* interpretation, but nobody has ever suggested "imposing" a single interpretation. Certainly, nothing in the RDF Semantics document speaks of anything like this.

>  By this I mean, for example, that:
> 
> - Two different graph authors may have different sets of intended interpretations in mind when they publish their RDF graphs, and the same URI may indeed denote different resources in those interpretations.

Different sets of interpretations in mind, yes, of course (standard).  URIs denoting different things in different interpretations, yes of course (standard). URIs denoting different things in different *sets* of interpretations, yes, if we are talking about sets of interpetations an author *has in mind*. But URIs denoting things in a set of all interpretations which satisfy a given graph? No, for the reasons described above. That idea is incoherent. 

> - The most accurate way to understand a graph is to interpret it in the way that the author intended it to be interpreted. Since we have no other reliable way of knowing what that might be, we can assume that the author's intended interpretations for a graph are a subset of the graph's **satisfying interpretations**.  I.e., we take the graph's meaning at face value, rather than attempting to interpret it according to some hidden, assumed "real world" interpretation.

Yes, this is exactly what the RDF model theoretic semantics presumes. Asserting a graph effectively claims that interpretations must be such as to make it true, i.e. to satisfy it. Each graph makes some claims about how the world is structured, and the claims made by multiple graphs are connected by their common use of global IRIs. 

> Some benefits of looking at the formal semantics this way

What "way" are you talking about? Look, *of course* each graph has a set of satisfying interpretations, and asserting the graph is saying that the world being described by the graph is one of those satisfying interpretations. (Or if we want to give authors the ability to be vague about exactly what they are talking about, then the interpretations of whatever the author had in mind are a subset of the satisfying interpretations.) And of course we should take a graph at face value, as you put it, as saying exactly this. All this is *exactly* what the current semantics itself says (or presumes). As far as I can see, you are simply agreeing with standard model-theoretic intuitions here. 

> Is this making any more sense to you?

No. I don't know what the "it", that is supposed to provide all these advantages, actually is. If it is the idea that asserting a graph amounts to saying that the intended interpretation is one of those satisfying the graph, then this is what model theory says already. If it is the idea that an IRI can refer to one thing in one graph and a different thing in a different graph, then that is false (by definition) but in any case would not provide all these claimed advantages that 'it' is supposed to have, even if it could be made somehow true. 

>   Have I explained myself in sufficient detail, or do you still think that "David . . . does not properly understand the intuitive foundations of semantics" and my points are mere "inanity", as you previously concluded?

I regret if my usage here seemed impolite, but I do (still) find your posts, including this one, to be a strange mixture of basic ideas about model theory (re)stated as though they were somehow a new insight or an alternative to the standard view (which I referred to as "inanity") and strangely stubborn basic mistakes which do, I am afraid, strongly suggest that you have not grokked the basic ideas of model theory. 

> And do you *still* think I merely need to go read a book on model theory, or have we now (I hope) got past that?  If not, what aspects of model theory do you still think I misunderstand?

Well, I guess, the basic idea of an interpretation. An RDF interpretation, by definition, is a mapping from IRIs to referents. It is not a mapping from IRIs-in-graphs or from IRI-occurrences or from IRIs-in-a-context. Ergo, every interpretation treats all occurrences of an IRI in the same way, as referring to the same thing, regardless of which graphs the IRI happens to occur in. Therefore, the notion of what an IRI denotes "in a graph" is meaningless. This basic fact – and it is a very basic and foundational point – still seems to elude you. To emphasize, this is not a "perspective" which admits alternatives, it is simply a fact about how interpretations are defined. 

> The bottom line here is that some of the statements -- and intuition -- in the existing RDF drafts are just plain *wrong* and need to be corrected.  In particular, the statement in RDF Concepts that says "IRIs have global scope: Two different appearances of an IRI denote the same resource" is just factually *wrong*.

It is a presumption of the RDF data model.  The semantics, in particular, is based on it. I don't quite see how it can be factually wrong, since RDF *defines* the notion of denotation. (If it had said "identiifies" then it might be factually wrong, but it doesn't.) 

Pat

------------------------------------------------------------
IHMC                                     (850)434 8903 home
40 South Alcaniz St.            (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile (preferred)
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 31 October 2013 06:33:24 UTC