- From: David Booth <david@dbooth.org>
- Date: Sat, 26 Mar 2011 17:25:57 -0400
- To: nathan@webr3.org
- Cc: Pat Hayes <phayes@ihmc.us>, Tim Berners-Lee <timbl@w3.org>, Kjetil Kjernsmo <kjekje@ifi.uio.no>, SW-forum Web <semantic-web@w3.org>
Hi Nathan, Excellent and very insightful analysis! The "giant, global graph with unique identities" approach that you describe is fine for some limited application areas, such as: - within a relatively small, controlled environment; or - with applications that are willing to assume the risk of unstable definitions. But it is not sufficient as a general approach at web scale. The reason, in essence, is that it sets up an endless guessing game between a URI's owner and its users: the URI owner thinks of a unique resource, but provides a definition that only gives hints about it, and the users of that URI must guess its identity. Each time the URI owner updates the definition to add more hints, some of those users discover that they guessed wrong, and, through no fault of their own, their work is no longer consistent with the URI's definition. I'll explain in more detail, but the explanation involves multiple steps, so bear with me. 1. I assume you mean this "giant, global graph" to be consistent, since otherwise it would be meaningless. Incidentally, I've been referring to this as myth #2: http://dbooth.org/2010/ambiguity/paper.html#myth2 . But how on earth could we expect to know what that giant, global graph should be? Obviously we cannot assume that it consists of the merge of *all* RDF graphs, since that would clearly be an inconsistent mess. On the web, anyone can say anything about anything, and much of it is rubbish. So we cannot, in advance, *assume* that we have such as graph and use that as the basis for showing how a "unique identity" approach works based on that assumption. Instead, we need to go in the opposite direction: start with two graphs that we *can* assume are (individually) consistent, and then *merge* them to come incrementally closer to that idealized, giant, global graph. As with proof by induction, if we can show that an approach to resource identity works for *one* small graph, *and* we can show how it works when two graphs are merged, then we have shown how it can work on increasingly larger graphs. Thus, in the limit as time t goes to infinity we would reach nirvana, where all knowledge of the universe has been formally encoded, and there is only one, unique interpretation of the graph: every URI uniquely identifies exactly one resource. ;) I imagine this was the intent behind your idealized "giant, global graph", so now let's proceed in this direction. 2. To avoid vagueness, and to prevent the possibility of any hidden "then a miracle occurs" step, http://star.psy.ohio-state.edu/coglab/Miracle.html let us assume that the resource definition is provided only in RDF -- not natural language. This assumption seems reasonable because: (a) RDF definitions facilitate machine processing, which is the whole point of using RDF to begin with; and (b) in principle any natural language definition could be expressed in RDF. 3. Now suppose that a URI owner, Oliver, mints a URI u that is intended to uniquely identify a particular resource that he has in mind -- Nathan's TV. As we know already, it is not possible for Oliver to describe this resource unambiguously, so as a simple example, let us assume that he (initially) provides a definition containing only the following assertions in graph gd: # Oliver's definition of <u> -- graph gd <u> a :TV . <u> :hasOwner :Nathan . 4. Next, an RDF statement author, Alice, uses Oliver's URI to publish a new RDF graph, ga: # Alice's graph ga <u> :alphaMax 27 . . . . Since <u> is supposed to identify a unique resource globally, Alice would like to verify that the resource she *thinks* <u> is supposed to identify determine whether her new RDF graph, ga, would give the URI the same resource identity than Oliver has in mind. But given only the URI's resource definition (graph gd), how can Alice possibly determine this? Clearly it isn't reasonable on web scale to expect Alice to personally ask Oliver for clarification. So, barring magic or miracles, the best Alice can do is to merge her graph ga with Oliver's resource definition gd and check for consistency. But, even if the merge is consistent, that does *not* indicate that Alice's graph ga actually *does* use the URI to denote the exact same resource that Oliver intended. It only indicates that it *could*: the merge admits at least one satisfying interpretation. In other words, all that Alice can determine is that the cloud of possible resources that <u> *might* identify in gd and ga overlaps, as illustrated in Figure 18 here: http://dbooth.org/2010/ambiguity/paper.html#figure-18 5. Note that Alice's graph contains an assertion that makes further assumptions about the identity of <u>. In essence, she has made a *guess* about the true, unique identity of <u>. This is normal: *anything* that Alice's graph may say about <u> that is not already entailed by Oliver's definition runs the risk of being "wrong" when Oliver tightens his definition. And it is likely that Alice *will* make statements about <u>, because, after all, she has chosen to use <u> in her graph for a reason. To phrase this in terms of the RDF Semantics, Alice's statements add constraints that reduce the set of satisfying interpretations. For example, in this case Alice has eliminated all possible interpretations in which the thing's alpha -- characterized by a :alphaMax and :alphaMin -- is greater than 27. 6. Next, a different RDF statement author, Bob, publishes a different graph gb using Oliver's URI: # Bob's graph gb <u> :alphaMin 43 . Bob and Alice know nothing of each other's work. Bob makes the same consistency checks that Alice made, and his graph is also consistent with Oliver's definition. 7. Next, Charlie wishes to merge Alice's graph ga with Bob's graph gb, but since (we'll assume) something's alpha value cannot have both a maximum of 27 and a minimum of 43, he finds that the merge is inconsistent. What can Charlie do? Charlie cannot convince either Alice or Bob to "fix" their data, because neither of them sees a problem with their data. In theory Charlie could first try to convince Oliver to tighten up the definition of <u>, and *then* he might convince Alice or Bob -- whoever had guessed wrong about the alpha value -- to fix his/her data, but this is not feasible to expect at web scale. Probably the best that Charlie can do is to either: (a) make his own guess about whether to side with Alice or Bob, and manually discard some of the other's assertions; or (b) split the identity of <u>, as described in http://dbooth.org/2010/ambiguity/paper.html#splitting Observation: At web scale we cannot expect RDF statement authors to be able to influence other people's URI definitions or RDF data, but statement authors still need to be able to make RDF statements using other people's URIs. 8. Now let's consider what happens when Oliver *does* decide to refine his definition, since this is the only way he can hint at the unique identity of <u>, and the objective is to continually tighten our definitions until we reach nirvana. :) Oliver adds the following triple to his definition, gd2: # Oliver's new definition of <u> -- graph gd2 <u> a :TV . <u> :hasOwner :Nathan . <u> :alphaMax 32 . Through no fault of Bob, Oliver has just broken Bob's graph gb, because gb is now inconsistent with Oliver's new definition, gd2. Regardless of the fact that Bob's graph gb may contain valuable information, it is now clear that <u> cannot identify the same resource in gb as it does in gd2. Furthermore, if we play this through farther, the more Oliver's definition of <u> is updated and tightened to more precisely identify the true resource that Oliver intended, the more it becomes inconsistent with existing graphs that used <u>. Finally, since Oliver's definition itself may have used other URIs whose definitions may change, Oliver would likely be forced to rewrite it *differently* -- not just tighten it -- when some of those definitions change and it becomes inconsistent, thus breaking Alice and Bob's graphs in a different way. In essence then, the very process that was intended to bring us closer to the goal of a giant, global graph is the same process that causes instability, and the more we advance toward that goal, the more instability we create. This kind of instability may be manageable in a small, closed environment where you can control all of the definitions and keep them all in sync. And it may also be an acceptable risk to *some* applications. But it is not a workable approach at web scale for applications that need a more stable foundation. 9. What is the alternative? For semantic web architecture to work at web scale, I see no option but to acknowledge the essential ambiguity of resource identity, precisely *bound* that ambiguity with URI definitions (a/k/a URI declarations), and learn to live with it. Each definition will be precise *enough* for some applications even as it is ambiguous for others. Specifically, instead of assuming that a URI definition is an incomplete description of a globally unique resource, assume that the definition is the *complete* description of the resource: the definition is all you get, and *any* interpretation that is consistent with it is legitimate. This permits an application to know just enough about a URI's resource identity to get its job done, while providing a stable foundation for RDF authors. ------------------ A few more inline comments below . . . On Wed, 2011-03-23 at 01:12 +0000, Nathan wrote: > Hi Pat, > > Here's how I see it (discussing things we can't see again). > > On a universal scale (as in giant global graph) we have a set of nodes, > each node is associated with one or more unique names, and one or more > propositions. Each node can be seen as having a 1-1 relation with a > single distinct thing (whether real or abstract), and the set of > propositions bound to that node can be seen as characterizing (not > defining) the thing which the node is related to. Exactly what those > propositions characterize is open to interpretation, and when you're > only working with subsets of the global graph (as is the norm) what the > node is interpreted as characterizing gets increasingly less specific > ever more ambiguous. > > If we split the previous paragraph in half, then by looking at only the > first half we can argue that each name has at most one referent, and > each thing can have multiple names (a many-1 relation). If we look at > the second half then we can argue that each name can have multiple > referents, and each thing multiple names (a many-many relation). > > An application may not need to consider or know every property of a > thing to answer the question it is being asked, and may not need to (or > be able to) make distinctions between unique things. > > So, to what does a name refer? > > To me it is important to view each name as having at most one referent, > then if you tell me that you interpret the name as referring to > something else, I can offer some more propositions and refine my > description, in order that we may collectively describe the world and > hopefully start to understand each thing. If you add those propositions to your existing URI definition then you risk breaking downstream applications that used your URI. This may be the policy that you want, and if so it is important to publish your change policy, so that others can choose whether to accept this risk. But for more stability, you can instead mint a new URI with a tighter definition. There is a trade-off between the two policies. David Booth > > So, whilst I understand that the distinctions don't always matter, and > that it's generally nigh on impossible to define a thing unambiguously, > I still feel it is critically important to view each name as having a > single referent, and to view each name as identifying a unique thing, > unless told otherwise (by proposition or inference). > > in-line: > > Pat Hayes wrote: > > On Mar 20, 2011, at 10:30 AM, Nathan wrote: > >> This is why we couple descriptions to names, to give an indication of what we are using a name to refer to, sure our descriptions may be ambiguous and open to refinement, but our names are not; because we are not using simple string token names "everest" or "lightbulb", we're using distinct URIs. > > > > So, are you saying it is the *syntax* of URIs which gives them this magical quality? So one gets unambiguous reference by putting a colon in the name somewhere? OK, forgive my sarcasm: but if this is not what you are saying, just what ARE you saying, that gives URIs this amazing ability to reach out into the world and seize upon their single unique referent? > > The point I was trying to make (badly) was two fold: > > 1: Rather than saying "when I say X I mean this" and "when you say X you > mean that" (where this != that) as humans with limited vocabulary often > do. We can instead use URIs with gives us a wider vocabulary and greater > opportunity to have one or more unique names for each referent. > > 2: The magical quality is in the specs and a social agreement, that we > will typically consider each URI as having at most one referent, thus > allowing us to say that each URI unambiguously identifies a single > thing; even when the interpreted characterization of that thing is > ambiguous. > > >[snip] > >> So, I have to conclude that the names aren't ambiguous here > > > > What would lead you to that conclusion? I don't see that you have argued for it anywhere. Like TimBL's claim, it seems to be a matter of W3C Dogma rather than an actual observation or even a rationally defended position. And as it is radically false, and indeed in many cases *provably* false, it seems rather obtuse to be defending it with so slender an excuse or argument. > > Hopefully the above helps explain my own personal thinking on it, well > as well as I can understand things given my limited knowledge. > > Best, > > Nathan > > > -- David Booth, Ph.D. http://dbooth.org/ Opinions expressed herein are those of the author and do not necessarily reflect those of his employer.
Received on Saturday, 26 March 2011 21:26:32 UTC