- From: David Booth <david@dbooth.org>
- Date: Mon, 28 Oct 2013 18:09:06 -0400
- To: Pat Hayes <phayes@ihmc.us>
- CC: Antoine Zimmermann <antoine.zimmermann@emse.fr>, www-archive <www-archive@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Ivan Herman <ivan@w3.org>, Sandro Hawke <sandro@w3.org>
Hi Pat, On 10/20/2013 04:31 AM, Pat Hayes wrote: > David, greetings. > > Most of what you write in this message is completely uncontroversial > and I would entirely agree with it. Rather than respond point by > point, let me try to summarize. > > 1. People who publish RDF (or indeed any other) content may have > different ideas about what IRIs mean, and the readers or users of > this data may also have different ideas about what the IRIs mean. > Call this "mismatch". Yes. Furthermore: (a) this is unavoidable in the long run; and (b) this can be modeled very nicely by the idea that they have different interpretations in mind. 2. Even when the publishers and users of RDF > share a common understanding of what IRIs mean, the actual RDF will > not be enough to formally pin down this mutual understanding, so that > the RDF (considered in isolation from other possible sources of > meaning) will be satisfied by 'nonstandard' interpretations which do > not conform to this shared mutual understanding. Call this > "underdetermination". True enough, but: (a) underdetermination is not what I'm mainly talking about; (b) I think "unintended" would be a more accurate characterization than "nonstandard"; and (c) the notion that there is a common, pre-existing "mutual understanding" of what IRIs mean is perilous, because a central problem in this whole business is the problem of how IRIs are supposed to *become* associated with their intended denotations. In any case, when this occurs we can say that the intended interpretations of a graph are a proper subset of the satisfying interpretations. Finally, in the spirit of disallowing any "then a miracle occurs" steps, http://blog.stackoverflow.com/wp-content/uploads/then-a-miracle-occurs-cartoon.png we can reasonably assume that the importance of the difference between the intended interpretations and the satisfying interpretations is minimal, because: (a) for scalability in the Semantic Web URI definitions must rely on description rather than ostension; http://en.wikipedia.org/wiki/Ostensive_definition and (b) at least in principle, anything that can be described in, say, English prose could instead be described in RDF. 3. In some cases, the difference referred to in > (1) may be so great that different pieces of published content are > mutually inconsistent. Let me call this "divergence". Yes, #1 leads to divergence. 4. It is also > possible that two publishers of RDF content might have perfectly > aligned notions of what all the IRIs mean, but simply disagree > concerning the facts. Call this "disagreement". Yes. > > I have deliberately avoided the word "ambiguity", because it is > ambiguous. You and I agreed long ago that RDF – probably all data on > the Web – is inherently ambiguous in the strict sense that it does > not pin down a unique satisfying interpretation, ie it is > underdetermined. We agreed that some of the TAG publications on > "uniqueness of identification" were conceptually faulty in the way > they were worded, since they seem to suggest that this unachievable > goal is necessary to Web operation. Right. And it is also not necessary to Semantic Web operation. Do we agree on that as well? (That may take more explanation.) > Underdermination is indeed > inevitable. But "ambiguity" can be taken to imply mismatch, and this > is *not* inevitable. And even a mismatch does not inevitably lead to > divergence, or to any detectable inconsistencies between different > usages of an IRI. What do you mean by inevitable? I agree that at any point in time, there is not necessarily a mismatch or inconsistency. But AFAICT, the trend is inevitably *toward* mismatch as more statements are published, assuming that: (a) parties publish data independently (without knowledge of each other); and (b) the URI definition is not continually modified to track newly published data that uses the URI. Do you agree? If not, how do you think divergence can be avoided? > > Divergence and disagreement are formally indistinguishable: they both > give rise to contradictions. For example, Alice publishes Everest was > first climbed in 1953 Bob publishes Everest was first climbed in > 1954 and with enough extra stuff about uniqueness of dates of first > climbs, we can derive a formal contradiction, let us suppose. Now, it > might be that Bob is using "Everest" to refer to K2, in which case we > have divergence; or he might just be wrong about the date Hilary and > Tensing made their historic climb, in which case we have a > disagreement. In the first case, both Alice and Bob have their facts > straight, but they are struggling over the referent of a name; in the > second case, Alice is right and Bob is wrong, but at least they both > know what they are talking about. Model-theoretic semantics isn't > able to usefully distinguish these two cases: all it can tell us is > that the things that Alice and Bob actually publish are (with some > extra assumptions) mutually inconsistent, for some reason. It does > not tell us what the reason is. Agreed. But there is an important practical difference between divergence and disagreement, because if one can determine that the contradiction is due to divergence, and the two source graphs of data were kept separate, then both graphs can still be used by "splitting" the resource identity to use two different URIs for the different notions that are denoted in the two graphs. In contrast, if the problem was disagreement, then the user of those graphs will want to decide which one is correct and discard the other as erroneous, or at least discard the erroneous assertion. Here's a little more on what I mean by "splitting": http://dbooth.org/2010/ambiguity/paper.html#splitting > > So, to sum up: published RDF content typically (perhaps always) has > many satisfying interpretations, ie it underdetermines its intended > meaning. Also, RDF from multiple sources may be mutually > inconsistent, ie be such that no interpretation satisfies it all. > There can be several reasons for this, including divergence of > intended meanings of IRIs and simple factual disagreements. But note > that when an inconsistency is detectable between what Alice and Bob > publish, then *something* is not right about that mutual publication. > Either they disagree about the facts of the matter, or they disagree > about what IRIs denote, or they have mutually incompatible ways of > describing the world. I do not mean to imply that one of them is > wrong and the other right (though that may be likely), only that they > do actually in some way clash in what they are saying. As a consumer > of their data, I would be obliged to choose between them, to make > decisions about what to accept and what to reject. No, not quite. If the problem is disagreement then yes, you would have to choose between the source graphs. But if the problem is divergence then you have to do some more work -- resource identity splitting -- but can still use both source graphs after splitting. This is an important difference that is lost if one lumps disagreement and divergence together. > > The intuitive picture (not part of the normative semantics document, > but intended to be understood by readers) is that the actual world > being described by RDF data is itself one of the interpretations, That strikes me as a naive, misleading and not very helpful intuition to promote, because: (a) RDF data does not generally describe the real world, it describes a particular *conceptualization* of the real world -- an *approximation* that is suitable for certain purposes; (b) it implicitly lumps divergence in with disagreement; and (c) it minimizes the relevance of multiple interpretations. It also more subtly places the focus on real world truth instead of usefulness, and IMO that is the wrong engineering criterion to use. Real world truth is a means to an end -- not the end itself. The important criterion is *usefulness*. An example I've often used to illustrate this is map data that models the world as flat. (I'm using the word "model" here in the generic English or computer science sense -- not in the model theory sense.) Clearly the real world is not flat, i.e., a 2D conceptualization of the world clearly is not the real world, so in a strict sense the data may be "wrong". And for applications such as calculating rocket trajectories or airplane flight paths, such data may be completely ususable. But for automobile navigation purposes, it may be good enough, and far simpler -- and thus *better* -- than "correct" 3D data. One may claim that such 2D data does not inherently need to be "wrong", if one carefully crafts the data and semantic claims about it, and that may be true. But bending over backward to craft the data that way, just so that it won't cause semantic contradictions when used in applications for which it was **not intended**, has a cost also. And while it is certainly nice when authors craft their data to be usable in applications far outside of the data's target application domain, I do not believe that we should shame authors who fail to do so. I think it is much more important that we: (a) encourage people to publish RDF data at all; (b) help the Semantic Web community understand how multiple interpretations provides a useful way to think about data that is inconsistent, when merged, due to divergence; and (c) help them learn how to deal with it. Finally, (rhetorically) what does it even mean to say that one of the interpretations is the real world? Let's take as an example, http://example.org/toucan , which Ian Davis has used both to denote a web page and a toucan: http://blog.iandavis.com/2010/11/04/is-303-really-necessary/ Before anyone complains about that example, please note that it is really just equivalent to a case of divergence, so if you don't like that particular example we could choose another, but the analysis would be exactly the same. Some applications ("web-page applications") care about web pages and assume interpretations in which http://example.org/toucan maps to a web page and has web page-ish properties. In those applications we can imagine that the URI maps to the real world notion of a specific web page (whatever that means). These applications do not detect any inconsistencies in the data because they don't employ any assertions about birds or the idea that birds are disjoint from web pages. Other applications ("bird applications") may care about birds, and assume interpretations in which that URI maps to a toucan in the real world. These applications also do not detect any inconsistencies, for similar reasons. And still other applications ("bird-and-web-page applications") may care about both birds and web pages, employ additional data about birds and web pages -- including an assertion that says that they are disjoint -- and hence may find that URI unusable (unless they split it), because it conflates the toucan with the web page, and thus causes a logical contradiction. Which, if any, of the interpretations that these applications use are the **real world**? Probably not the interpretations used by either the bird applications or the the web-page applications. Possibly one that is used by the bird-and-web-page applications. But more likely *none* of them: most likely *all* of these applications assume interpretations that, when you dig deep enough to examine, correspond only *approximately* to the real world, but in fact differ from the real world in ways that would be revealed by the addition of more facts -- facts that those applications don't use or care about, and that may not even yet be known to science. > and > that the bare word "truth" – as when we might say, yes it is *true* > that Everest was first climbed in 1953 – refers to this real world, > but uses the same recursive analysis of how truth is determined from > a bare interpretation mapping – the same "truth conditions". Such a > picture is an integral part of how to relate the model theory to > other semantic conditions on RDF, such as those arising from > connections between RDF data and natural language texts or images. > But as I say, this is not part of the normative RDF semantics, which > is solely concerned with defining entailment relationships between > RDF graphs. > > OK so far? Because all of this is how the RDF semantics views the > world of RDF Web publication. I have used the terms 'satisfy', > 'interpretation' and 'inconsistent' here exactly as they are defined > in the formal semantics. Yes, excellent. > > Now, you seem to want to insist that there is something else, some > other way to use the formal semantic machinery, which somehow goes > beyond or provides some kind of alternative to this picture. Can you > say what it is, without using meaningless rhetoric such as > "single-interpretation assumption" or "agnostic" ? What is this > "other valid way" to think about the RDF semantics? Sorry if those phrases sound meaningless to you. I suspect there are at least a few others who understand them, but I suppose one person's useful insight is another person's meaningless rhetoric, so I'll try to find other phrasings that I hope will be more helpful to you. The other way to think of the RDF Semantics is in terms of *multiple* interpretations, instead of attempting to assume or impose a single "real world" interpretation. By this I mean, for example, that: - Two different graph authors may have different sets of intended interpretations in mind when they publish their RDF graphs, and the same URI may indeed denote different resources in those interpretations. This of course is not desirable, but it is inevitable, and it reflects the actual state of affairs far better than naively assuming that graph authors all have the same real world interpretation in mind. - Those RDF graphs may be useful -- and work fine -- for different classes of applications that (in essence) assume different interpretations. I.e., different applications have different conceptualizations of the world; those conceptualizations correspond to interpretations. - The most accurate way to understand a graph is to interpret it in the way that the author intended it to be interpreted. Since we have no other reliable way of knowing what that might be, we can assume that the author's intended interpretations for a graph are a subset of the graph's **satisfying interpretations**. I.e., we take the graph's meaning at face value, rather than attempting to interpret it according to some hidden, assumed "real world" interpretation. - The most sensible answer to the question "What resource does URI U denote in graph G?" would be either "whatever it denotes in G's satisfying interpretations" or "whatever it denotes in the author's intended interpretations", but *not* "whatever it denotes in the 'real world'". The "real world" interpretation is largely irrelevant -- both to the formal semantics and to understanding how the Semantic Web *actually* works. Some benefits of looking at the formal semantics this way: (a) it corresponds more closely to actual practice than assuming that all authors are talking about the same real world; (b) it helps to explain the difference between divergence and disagreement; (c) it helps to explain how to deal with divergence when it happens; (d) it allows more data to be recognized as useful (even if it isn't 100% "correct"), because it allows more graphs to be treated as true, whereas if we think in terms of a single, real world interpretation, then nearly every RDF graph would be false, and false graphs aren't very useful, because they entail everything; (e) it provides a formal framework for understanding the fact that different applications care about different resource identity distinctions (and this may give rise to divergence); (f) it at last provides a sensible way to formalize and understand the httpRange-14 issue as a case of divergence; and (g) it provides a useful and practical foundation for understanding the practical use of owl:sameAs, rather than simply lamenting its supposed "abuse". Is this enough? Is this making any more sense to you? Have I explained myself in sufficient detail, or do you still think that "David . . . does not properly understand the intuitive foundations of semantics" and my points are mere "inanity", as you previously concluded? http://lists.w3.org/Archives/Public/public-rdf-wg/2013Oct/0079.html (And BTW, posting such libelous remarks in a forum to which I am unable to respond -- since I cannot post to that list -- was extremely unfair, and rather upsetting to see. I have a great deal of respect for your insight and contributions, and I do not appreciate being maligned that way, even if you do find my points frustratingly difficult to understand sometimes.) And do you *still* think I merely need to go read a book on model theory, or have we now (I hope) got past that? If not, what aspects of model theory do you still think I misunderstand? I've found your explanations excellent, BTW, but I wouldn't expect you to personally explain everything that you think I need to know. I'd be happy to read up further on specific aspects that you think are critical to this discussion. The bottom line here is that some of the statements -- and intuition -- in the existing RDF drafts are just plain *wrong* and need to be corrected. In particular, the statement in RDF Concepts that says "IRIs have global scope: Two different appearances of an IRI denote the same resource" is just factually *wrong*. IRIs are indeed *intended* to everywhere denote the same resource, and that's a perfectly good goal, even if it is inherently unachievable. But it is a *goal* -- it is not the reality. Best wishes, David
Received on Monday, 28 October 2013 22:09:35 UTC