- From: Rich Cooper <rich@englishlogickernel.com>
- Date: Mon, 8 Apr 2013 11:08:18 -0700
- To: "'David Booth'" <david@dbooth.org>, "'Pat Hayes'" <phayes@ihmc.us>
- Cc: "'Peter Ansell'" <ansell.peter@gmail.com>, "'Alan Ruttenberg'" <alanruttenberg@gmail.com>, "'public-semweb-lifesci'" <public-semweb-lifesci@w3.org>
- Message-ID: <7F6AF70B2A2743B58AB4A87692DBE50F@Gateway>
Dear David, You wrote: 1. Owen's URI definition will always be ambiguous. There will always exist a property p such that neither p nor its negation are entailed by the URI definition. While true, this leaves out the subjective part; Aster might believe, without the addition of a new property p, that Owen's URI means one thing, while Albert believes a different interpretation of Owen's URI from Aster's. While adding a new property (which can always be done IMHO) makes it mathematically clear, I would like to emphasize that the individual Observer (Aster, Albert, Algernon, Argentium, or whoever) also makes an individual interpretation which can be different an arbitrary other Observer. I believe the history of group actions taken on "standards" shows that the individual is the source of most divergence in interpretations. -Rich Sincerely, Rich Cooper EnglishLogicKernel.com Rich AT EnglishLogicKernel DOT com 9 4 9 \ 5 2 5 - 5 7 1 2 -----Original Message----- From: David Booth [mailto:david@dbooth.org] Sent: Monday, April 08, 2013 7:07 AM To: Pat Hayes Cc: Peter Ansell; Alan Ruttenberg; public-semweb-lifesci Subject: Re: owl:sameAs - Harmful to provenance? Hi Pat, On 04/04/2013 02:03 AM, Pat Hayes wrote: > > On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote: > >> On 4 April 2013 11:58, David Booth <david@dbooth.org> wrote: On >> 04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2, >> 2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote: >> On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote: >> >> If only owl:sameAs were used correctly... >> >> Well, I agree that is a problem, but don't draw the conclusion >> that there is something wrong with sameAs, just because people keep >> using it wrong. >> >> Agreed. And furthermore, don't draw the conclusion that someone >> has used owl:sameAs wrong just because you get garbage when you >> merge two graphs that individually worked just fine. Those two >> graphs may have been written assuming different sets of >> interpretations. >> >> In that case I would certainly conclude that they have used it >> wrong. Have you not been reading what Pat and I have been writing? >> >> I've read lots of what you and Pat have written. And I've learned >> a lot from it -- particularly in learning about ambiguity from Pat. >> And I'm in full agreement that owl:sameAs is *often* misused. >> >> But I don't believe that getting garbage when merging two graphs >> that individually worked fine *necessarily* indicates that >> owl:sameAs was misused -- even when it appears on the surface to be >> causing the problem. > > I agree, but not with your example and your analysis of it. > >> Here's a simple example to illustrate. >> >> Using the following prefixes throughout, for brevity: >> >> @prefix : <http://example/owen/> . @prefix owl: >> <http://www.w3.org/2002/07/owl#> . >> >> Suppose that Owen is the URI owner of :x, :y and :z, and Owen >> defines them as follows: >> >> # Owen's URI definition for :x, :y and :z :x a :Something . :y a >> :Something . :z a :Something . >> >> That's all. That's Owen's entire definition of those URIs. >> Obviously this definition is "ambiguous" in some sense. But as we >> know, ambiguity is ultimately inescapable anyway, so I have merely >> chosen an example that makes the ambiguity obvious. As the RDF >> Semantics spec puts it: "It is usually impossible to assert enough >> in any language to completely constrain the interpretations to a >> single possible world". > > Yes, but by making the ambiguity this "obvious", you have rendered > the example pointless. There is *no* content here *at all*, so Owen > has not really published anything. This is not typical of published > content, even in RDF. Typically, in fact, there is, as well as some > nontrivial actual RDF content, some kind of explanation, perhaps in > natural language, of what the *intended* content of the formal RDF is > supposed to be. While an RDF engine cannot of course make use of such > intuitive explanations, other authors of RDF can, and should, make > use of it to try to ensure that they do not make assertions which > would be counter to the referential intentions of the original > authors. For example, the Dublin Core URIs were published with almost > no formal RDF axioms, but quite elaborate natural language glosses > which enable them to be used in formal RDF with considerable success. > The fact that formal (and even informal) data is inherently ambiguous > does not mean that it is inherently, or even typically, vacuous. This seems to suggest that natural language can somehow eliminate ambiguity, where formal languages cannot. I don't buy that. Presumably whatever definition one expressed in natural language could be expressed in a formal language -- in principle at least. And certainly the goal of the semantic web is to have such information expressed in a formal language that is amenable to machine processing. More precisely, the basic assumption I am making is that for (almost) any definition there exists a property such that neither that property nor its negation are entailed by the definition. I.e., there is always more than can be said about the thing whose identity is defined. Maybe that assumption is wrong; I don't know. If you think it's wrong, I'd be interested in hearing why. The example may not be "realistic", but it is *not* pointless. The whole point of choosing such a simple example is to expose the fundamental issues outright, rather than obscuring them in complexity that we cannot fully understand. If there is some fundamental reason why you think this problem cannot happen in a more "realistic" example, then please explain what mechanism would come into play to prevent it. > >> Arthur, an RDF author, publishes the following graph, G1, making >> certain assumptions about the interpretations that will be applied >> to it: >> >> # G1 :x owl:sameAs :y . > > On what basis does Arthur make this assertion? The URIs were coined > by Owen, and Owen says nothing that would sanction this assumption. Why Arthur or anyone else chooses to assert whatever they choose to assert is their business. It is irrelevant to this analysis. > >> Aster, another RDF author, publishes the following graph, G2, >> making certain other assumptions about the interpretations that >> will be applied to it: >> >> # G2 :x owl:differentFrom :z . >> >> Alfred, a third RDF author, publishes the following graph, G3, >> making still other assumptions about the interpretations that will >> be applied to it: >> >> # G3 :y owl:differentFrom :z . > > Similarly for the other two. They are making assertions using names > that belong to, and were coined by, another author without having any > possible source of justification for these nontrivial claims. This > should not be regarded as good practice, to put it mildly. Ditto. If you are claiming that an RDF author needs some sort of "justification" to make assertions, then please explain exactly what you mean -- preferably in formal terms -- by "justification". E.g., does "justification" mean that Arthur may only make assertions that are entailed by Owen's definition? I already discussed that possibility below. > >> Note that G1, G2 and G3 are all individually consistent with Owen's >> URI definition. Furthermore, G1, G2 and G3 are all pair-wise >> consistent: there exists at least one satisfying interpretation for >> the merge of each pair. But the merge of G1, G2 and G3 is not >> consistent: > > This kind of behavior is of course quite typical in any assertional > language. Yes. > >> Arthur, Aster and Alfred made different assumptions about the set >> of interpretations that would be applied to their graphs, and the >> intersection of those sets was empty. >> >> Did Arthur misuse owl:sameAs? What if Aster never published G2? >> How could Aster's graph possibly affect the question of whether >> *Arthur* misused owl:sameAs? It would be nonsensical to assume >> that it could. > > Why? Surely if Aster had a more reliable access to the primary source > of information about these enigmatic thingies than Arthur did, then > it might well be the case that Aster's publication could reveal > errors in Arthur's, by contradicting him. What do you mean by "more reliable"? Both Arthur and Aster had access to the exact same URI definition from Owen. Are you suggesting that Arthur and/or Aster should have used a *different* URI definition? If so, what definition and why? > >> What if Owen later said that Arthur was correct, that :x == :y ? >> What if he later said the opposite? Again, it would seem rather >> bizarre to say that the determination of whether Arthur had >> misused owl:sameAs could be changed -- long after Arthur had >> written G1 -- by Owen's later statements. > > Again, I don't find this bizarre in the least. It might be, if there > was no truth of the matter concerning all this stuff, so that all > these assertions were made independently with equal (or equal lack > of) authority as to their actual truth. But that is so implausible > and artificial an assumption that I don't see why we need to even > discuss it. The RDF Semantics is explicitly agnostic about interpretations and "actual truth". Owen published a URI definition, and Arthur, Aster and Alfred published a bunch of assertions. Whether anyone "believes" any of those assertions, whether those assertions have any bearing on the "real world", and whether they are at all useful to anyone's applications, are entirely different questions. AFAICT those questions are irrelevant to the technical question of whether Arthur "misused" owl:sameAs. > >> One might claim that Arthur misused owl:sameAs because Owen had not >> specified whether :x was the same or different from :y or :z, and >> therefore Arthur had improperly *guessed* about the value of :x's >> owl:sameAs property. >> >> But by that logic, Arthur would not be able to assert *anything* >> new about :x. I.e., Arthur would not be allowed to assert any >> property whose value was not already entailed by Owen's >> definition! > > Arthur may add information, of course. But Arthur is responsible for > the truth of what he asserts, and part of that responsibility, in > practice, is to take care to ascertain what the intended referents > are of any URIs published by others, that Arthur then uses in his > assertions. But Arthur, Aster and Alfred were each fully diligent in ensuring that their assertions were consistent with all information that Owen provided. What more could they do? > For example, if I (as I recently did) wish to assert that > something was red in color, I might use the URI > > http://linkedopencolors.moreways.net/color/rgb/ff0 000.html > > rather than, say, > > http://linkedopencolors.moreways.net/color/rgb/00f f00.html > > because I know, using my color vision (not available to RDF engines) > that the first one refers to red and the second one to green, which > (I also know) is not red. I *could* use the second URI and insist > that I intended it to denote the color red, but that would be stupid, > since readers of my RDF will (and indeed should) misunderstand me. If > I were to assert that > > http://linkedopencolors.moreways.net/color/rgb/00f f00.html > owl:sameAs http://linkedopencolors.moreways.net/color/css/red .html > . > > then I would be saying something false. And yes, in that case, it > *is* my error, even if what I have said is formally consistent (which > it in fact is) with the published RDF "definition" of these URis > (which is in fact empty.) In that example there were additional constraints that were not expressed formally -- such as the fact that red and green are different colors, and what wavelengths correspond to which colors, etc. But unless you are claiming that assertions expressed in natural language can somehow avoid ambiguity where formal assertions cannot, then for the sake of analysis we can assume that all assertions have been expressed formally. I am also assuming that in the vast majority of cases, a URI's resource identity will be defined by a description, rather than by ostension http://plato.stanford.edu/entries/identity/ so I am focusing on that case. > >> And that would render RDF rather pointless. > > Why would it render it pointless? The point of RDF is not to make > completely unjustified statements about nothing in particular. RDF is designed to allow anyone to say anything about anything. If someone chooses to make completely unjustified statements about nothing in particular, that is their business. AFAICT that is completely irrelevant to the technical question of whether owl:sameAs was used incorrectly. > >> Maybe someone can see a way to avoid this dilemma. Maybe someone >> can figure out a way to distinguish between the "essential" >> properties that serve to identify a resource, and other >> "inessential" properties that the resource might have. If so, and >> the number of "essential" properties is finite, then indeed this >> problem could be avoided by requiring every URI owner to define all >> of the "essential" properties of the URI's denoted resource, or by >> prohibiting anyone but the URI owner from asserting any new >> "essential" properties of the resource (beyond those the URI owner >> had defined). Or maybe there is another way around this dilemma. > > What do you see the "dilemma" here as being, exactly? It seems to me > that this is not about RDF as such at all. It is about data, however > that data is recorded. People can publish data about things. They do > so by making assertions. In an ideal world, everyone is responsible > for the assertions they make. Other people can put together > information from various sources, but the reliability of the result > is hostage to the reliability of all the sources that are used. All > this is kind of obvious, but what else is being said in this thread? The dilemma is that we would like each URI to always denote the same thing in all RDF datasets, so that when we merge RDF datasets, the merge will make sense: the merge will be consistent and an application that worked properly on an individual RDF dataset will also work properly on the merge of that dataset with other datasets. But because URI definitions are inherently ambiguous, different RDF authors will interpret them differently, and this leads to inconsistencies when datasets are merged -- even when all parties have acted in good faith and have done all that they could reasonably have been expected to do to avoid such conflicts. Key assumptions: 1. Owen's URI definition will always be ambiguous. There will always exist a property p such that neither p nor its negation are entailed by the URI definition. 2. Owen cannot be expected to forever refine his URI definition by adding disambiguation at the request of every RDF author who uses his URIs. At some point, Owen will reach the point of saying "that's all the disambiguation you get". (This is the point at which the example that I gave begins.) > >> >> Unless some way around this dilemma is found, it seems unreasonably >> judgemental to accuse Arthur of misusing owl:sameAs in this case, > > Possibly, yes, but not because... > >> since he didn't assert anything that was inconsistent with Owen's >> URI definition > > Consistency is not the point. If I make completely unfounded > assertions about a topic that you have introduced, then the fact they > might be logically consistent with what you have said is neither here > nor there. What matters is whether I have the authority to make the > assertions I do, or whether I am lying, fabricating or simply > fantasizing using Owen's vocabulary. Can you translate that into more objective technical terms? What exactly does "unfounded" mean? And what do you mean by "authority"? What objective technical criteria are you suggesting? And why is it relevant to the question of whether Arthur misused owl:sameAs, given that the RDF Semantics is explicitly agnostic about interpretations? David Booth
Received on Monday, 8 April 2013 18:09:01 UTC