- From: Michael Miller <Michael.Miller@systemsbiology.org>
- Date: Mon, 8 Apr 2013 11:06:28 -0700
- To: Phillip Lord <phillip.lord@newcastle.ac.uk>, Oliver Ruebenacker <curoli@gmail.com>
- Cc: David Booth <david@dbooth.org>, Pat Hayes <phayes@ihmc.us>, Peter Ansell <ansell.peter@gmail.com>, Alan Ruttenberg <alanruttenberg@gmail.com>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
hi all, phillip, not to mention a name (like mine!) is not particularly unique. cheers, michael Michael Miller Software Engineer Institute for Systems Biology > -----Original Message----- > From: Phillip Lord [mailto:phillip.lord@newcastle.ac.uk] > Sent: Monday, April 08, 2013 9:53 AM > To: Oliver Ruebenacker > Cc: David Booth; Pat Hayes; Peter Ansell; Alan Ruttenberg; public-semweb- > lifesci > Subject: Re: owl:sameAs - Harmful to provenance? > > > And it is this bit -- "before we can do anything useful" that is utterly > wrong. > > Recently I have spent a lot of time look at Dublin Core creator fields. > You could not believe how many different ways they are used. String > literals ("Phillip Lord"), last-first ("Lord, Phillip"), with abbrevs > ("P. Lord"), multi-author ("Phillip Lord; Lindsay Marshall"), with > titles ("Dr Phillip Lord") and so on. > > So, is everyone using Dublin Core wrong? It is useless till everyone > uses it the same way? Emphatically no, it is not useless. > > Would it better if everybody did use it the same way? The answer is > probably not. Names are incredibly complex, and representing them is, in > turn, difficult and hard. Any specificiation which did full justice to > all the different name forms in existance would be incredibly > long-winded. Many people using the specification would get it wrong; or > you could have a mechanism for ensuring people always used it correctly. > Then I am sure that both people who ended up using this form of spec > would have great fun integrating their tiny datasets. > > In the example, we have a number of sets of assertions which > individually fulfil their creators use-cases. Then, when they are bought > together, the assertions become inconsistent, telling you up front that > there is work to be done. And you ask in what way is this useful? > > Perfection is the enemy of Good. > > > > Oliver Ruebenacker <curoli@gmail.com> writes: > > So what most people here are saying is that before we can do anything > > useful, we need to make sure that if two assertions use the same > reference, > > they mean the same thing. > > > > To which you respond that you will accept assertions without assuming > > that same references mean same things. You will just keep them separate. > > There is no rule against that. > > > > But in what way is this useful? > > > > Take care > > Oliver > > > > On Mon, Apr 8, 2013 at 10:07 AM, David Booth <david@dbooth.org> wrote: > > > >> Hi Pat, > >> > >> > >> On 04/04/2013 02:03 AM, Pat Hayes wrote: > >> > >>> > >>> On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote: > >>> > >>> On 4 April 2013 11:58, David Booth <david@dbooth.org> wrote: On > >>>> 04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2, > >>>> 2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote: > >>>> On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote: > >>>> > >>>> If only owl:sameAs were used correctly... > >>>> > >>>> Well, I agree that is a problem, but don't draw the conclusion > >>>> that there is something wrong with sameAs, just because people keep > >>>> using it wrong. > >>>> > >>>> Agreed. And furthermore, don't draw the conclusion that someone > >>>> has used owl:sameAs wrong just because you get garbage when you > >>>> merge two graphs that individually worked just fine. Those two > >>>> graphs may have been written assuming different sets of > >>>> interpretations. > >>>> > >>>> In that case I would certainly conclude that they have used it > >>>> wrong. Have you not been reading what Pat and I have been writing? > >>>> > >>>> I've read lots of what you and Pat have written. And I've learned > >>>> a lot from it -- particularly in learning about ambiguity from Pat. > >>>> And I'm in full agreement that owl:sameAs is *often* misused. > >>>> > >>>> But I don't believe that getting garbage when merging two graphs > >>>> that individually worked fine *necessarily* indicates that > >>>> owl:sameAs was misused -- even when it appears on the surface to be > >>>> causing the problem. > >>>> > >>> > >>> I agree, but not with your example and your analysis of it. > >>> > >>> Here's a simple example to illustrate. > >>>> > >>>> Using the following prefixes throughout, for brevity: > >>>> > >>>> @prefix : <http://example/owen/> . @prefix owl: > >>>> <http://www.w3.org/2002/07/**owl# > <http://www.w3.org/2002/07/owl#>> . > >>>> > >>>> Suppose that Owen is the URI owner of :x, :y and :z, and Owen > >>>> defines them as follows: > >>>> > >>>> # Owen's URI definition for :x, :y and :z :x a :Something . :y a > >>>> :Something . :z a :Something . > >>>> > >>>> That's all. That's Owen's entire definition of those URIs. > >>>> Obviously this definition is "ambiguous" in some sense. But as we > >>>> know, ambiguity is ultimately inescapable anyway, so I have merely > >>>> chosen an example that makes the ambiguity obvious. As the RDF > >>>> Semantics spec puts it: "It is usually impossible to assert enough > >>>> in any language to completely constrain the interpretations to a > >>>> single possible world". > >>>> > >>> > >>> Yes, but by making the ambiguity this "obvious", you have rendered > >>> the example pointless. There is *no* content here *at all*, so Owen > >>> has not really published anything. This is not typical of published > >>> content, even in RDF. Typically, in fact, there is, as well as some > >>> nontrivial actual RDF content, some kind of explanation, perhaps in > >>> natural language, of what the *intended* content of the formal RDF is > >>> supposed to be. While an RDF engine cannot of course make use of such > >>> intuitive explanations, other authors of RDF can, and should, make > >>> use of it to try to ensure that they do not make assertions which > >>> would be counter to the referential intentions of the original > >>> authors. For example, the Dublin Core URIs were published with almost > >>> no formal RDF axioms, but quite elaborate natural language glosses > >>> which enable them to be used in formal RDF with considerable success. > >>> The fact that formal (and even informal) data is inherently ambiguous > >>> does not mean that it is inherently, or even typically, vacuous. > >>> > >> > >> This seems to suggest that natural language can somehow eliminate > >> ambiguity, where formal languages cannot. I don't buy that. Presumably > >> whatever definition one expressed in natural language could be > expressed in > >> a formal language -- in principle at least. And certainly the goal of the > >> semantic web is to have such information expressed in a formal language > >> that is amenable to machine processing. > >> > >> More precisely, the basic assumption I am making is that for (almost) any > >> definition there exists a property such that neither that property nor its > >> negation are entailed by the definition. I.e., there is always more than > >> can be said about the thing whose identity is defined. Maybe that > >> assumption is wrong; I don't know. If you think it's wrong, I'd be > >> interested in hearing why. > >> > >> The example may not be "realistic", but it is *not* pointless. The whole > >> point of choosing such a simple example is to expose the fundamental > issues > >> outright, rather than obscuring them in complexity that we cannot fully > >> understand. If there is some fundamental reason why you think this > problem > >> cannot happen in a more "realistic" example, then please explain what > >> mechanism would come into play to prevent it. > >> > >> > >> > >>> Arthur, an RDF author, publishes the following graph, G1, making > >>>> certain assumptions about the interpretations that will be applied > >>>> to it: > >>>> > >>>> # G1 :x owl:sameAs :y . > >>>> > >>> > >>> On what basis does Arthur make this assertion? The URIs were coined > >>> by Owen, and Owen says nothing that would sanction this assumption. > >>> > >> > >> Why Arthur or anyone else chooses to assert whatever they choose to > assert > >> is their business. It is irrelevant to this analysis. > >> > >> > >> > >>> Aster, another RDF author, publishes the following graph, G2, > >>>> making certain other assumptions about the interpretations that > >>>> will be applied to it: > >>>> > >>>> # G2 :x owl:differentFrom :z . > >>>> > >>>> Alfred, a third RDF author, publishes the following graph, G3, > >>>> making still other assumptions about the interpretations that will > >>>> be applied to it: > >>>> > >>>> # G3 :y owl:differentFrom :z . > >>>> > >>> > >>> Similarly for the other two. They are making assertions using names > >>> that belong to, and were coined by, another author without having any > >>> possible source of justification for these nontrivial claims. This > >>> should not be regarded as good practice, to put it mildly. > >>> > >> > >> Ditto. If you are claiming that an RDF author needs some sort of > >> "justification" to make assertions, then please explain exactly what you > >> mean -- preferably in formal terms -- by "justification". E.g., does > >> "justification" mean that Arthur may only make assertions that are > entailed > >> by Owen's definition? I already discussed that possibility below. > >> > >> > >> > >>> Note that G1, G2 and G3 are all individually consistent with Owen's > >>>> URI definition. Furthermore, G1, G2 and G3 are all pair-wise > >>>> consistent: there exists at least one satisfying interpretation for > >>>> the merge of each pair. But the merge of G1, G2 and G3 is not > >>>> consistent: > >>>> > >>> > >>> This kind of behavior is of course quite typical in any assertional > >>> language. > >>> > >> > >> Yes. > >> > >> > >> > >>> Arthur, Aster and Alfred made different assumptions about the set > >>>> of interpretations that would be applied to their graphs, and the > >>>> intersection of those sets was empty. > >>>> > >>>> Did Arthur misuse owl:sameAs? What if Aster never published G2? > >>>> How could Aster's graph possibly affect the question of whether > >>>> *Arthur* misused owl:sameAs? It would be nonsensical to assume > >>>> that it could. > >>>> > >>> > >>> Why? Surely if Aster had a more reliable access to the primary source > >>> of information about these enigmatic thingies than Arthur did, then > >>> it might well be the case that Aster's publication could reveal > >>> errors in Arthur's, by contradicting him. > >>> > >> > >> What do you mean by "more reliable"? Both Arthur and Aster had access > to > >> the exact same URI definition from Owen. Are you suggesting that Arthur > >> and/or Aster should have used a *different* URI definition? If so, what > >> definition and why? > >> > >> > >>> What if Owen later said that Arthur was correct, that :x == :y ? > >>>> What if he later said the opposite? Again, it would seem rather > >>>> bizarre to say that the determination of whether Arthur had > >>>> misused owl:sameAs could be changed -- long after Arthur had > >>>> written G1 -- by Owen's later statements. > >>>> > >>> > >>> Again, I don't find this bizarre in the least. It might be, if there > >>> was no truth of the matter concerning all this stuff, so that all > >>> > >>> these assertions were made independently with equal (or equal lack > >>> of) authority as to their actual truth. But that is so implausible > >>> and artificial an assumption that I don't see why we need to even > >>> discuss it. > >>> > >> > >> The RDF Semantics is explicitly agnostic about interpretations and "actual > >> truth". > >> > >> Owen published a URI definition, and Arthur, Aster and Alfred published a > >> bunch of assertions. Whether anyone "believes" any of those assertions, > >> whether those assertions have any bearing on the "real world", and > whether > >> they are at all useful to anyone's applications, are entirely different > >> questions. AFAICT those questions are irrelevant to the technical > question > >> of whether Arthur "misused" owl:sameAs. > >> > >> > >> > >>> One might claim that Arthur misused owl:sameAs because Owen had > not > >>>> specified whether :x was the same or different from :y or :z, and > >>>> therefore Arthur had improperly *guessed* about the value of :x's > >>>> owl:sameAs property. > >>>> > >>>> But by that logic, Arthur would not be able to assert *anything* > >>>> new about :x. I.e., Arthur would not be allowed to assert any > >>>> property whose value was not already entailed by Owen's > >>>> definition! > >>>> > >>> > >>> Arthur may add information, of course. But Arthur is responsible for > >>> the truth of what he asserts, and part of that responsibility, in > >>> practice, is to take care to ascertain what the intended referents > >>> are of any URIs published by others, that Arthur then uses in his > >>> assertions. > >>> > >> > >> But Arthur, Aster and Alfred were each fully diligent in ensuring that > >> their assertions were consistent with all information that Owen provided. > >> What more could they do? > >> > >> > >> For example, if I (as I recently did) wish to assert that > >>> something was red in color, I might use the URI > >>> > >>> > http://linkedopencolors.**moreways.net/color/rgb/ff0000.**html<http://li > nkedopencolors.moreways.net/color/rgb/ff0000.html> > >>> > >>> rather than, say, > >>> > >>> > http://linkedopencolors.**moreways.net/color/rgb/00ff00.**html<http://li > nkedopencolors.moreways.net/color/rgb/00ff00.html> > >>> > >>> because I know, using my color vision (not available to RDF engines) > >>> that the first one refers to red and the second one to green, which > >>> (I also know) is not red. I *could* use the second URI and insist > >>> that I intended it to denote the color red, but that would be stupid, > >>> since readers of my RDF will (and indeed should) misunderstand me. If > >>> I were to assert that > >>> > >>> > http://linkedopencolors.**moreways.net/color/rgb/00ff00.**html<http://li > nkedopencolors.moreways.net/color/rgb/00ff00.html> > >>> owl:sameAs > >>> > http://linkedopencolors.**moreways.net/color/css/red.**html<http://linke > dopencolors.moreways.net/color/css/red.html> > >>> . > >>> > >>> then I would be saying something false. And yes, in that case, it > >>> *is* my error, even if what I have said is formally consistent (which > >>> it in fact is) with the published RDF "definition" of these URis > >>> (which is in fact empty.) > >>> > >> > >> In that example there were additional constraints that were not > expressed > >> formally -- such as the fact that red and green are different colors, and > >> what wavelengths correspond to which colors, etc. But unless you are > >> claiming that assertions expressed in natural language can somehow avoid > >> ambiguity where formal assertions cannot, then for the sake of analysis > we > >> can assume that all assertions have been expressed formally. > >> > >> I am also assuming that in the vast majority of cases, a URI's resource > >> identity will be defined by a description, rather than by ostension > >> > http://plato.stanford.edu/**entries/identity/<http://plato.stanford.edu/en > tries/identity/> > >> so I am focusing on that case. > >> > >> > >> > >>> And that would render RDF rather pointless. > >>>> > >>> > >>> Why would it render it pointless? The point of RDF is not to make > >>> completely unjustified statements about nothing in particular. > >>> > >> > >> RDF is designed to allow anyone to say anything about anything. If > >> someone chooses to make completely unjustified statements about > nothing in > >> particular, that is their business. AFAICT that is completely irrelevant > >> to the technical question of whether owl:sameAs was used incorrectly. > >> > >> > >> > >>> Maybe someone can see a way to avoid this dilemma. Maybe someone > >>>> can figure out a way to distinguish between the "essential" > >>>> properties that serve to identify a resource, and other > >>>> "inessential" properties that the resource might have. If so, and > >>>> the number of "essential" properties is finite, then indeed this > >>>> problem could be avoided by requiring every URI owner to define all > >>>> of the "essential" properties of the URI's denoted resource, or by > >>>> prohibiting anyone but the URI owner from asserting any new > >>>> "essential" properties of the resource (beyond those the URI owner > >>>> had defined). Or maybe there is another way around this dilemma. > >>>> > >>> > >>> What do you see the "dilemma" here as being, exactly? It seems to me > >>> that this is not about RDF as such at all. It is about data, however > >>> that data is recorded. People can publish data about things. They do > >>> so by making assertions. In an ideal world, everyone is responsible > >>> for the assertions they make. Other people can put together > >>> information from various sources, but the reliability of the result > >>> is hostage to the reliability of all the sources that are used. All > >>> this is kind of obvious, but what else is being said in this thread? > >>> > >> > >> The dilemma is that we would like each URI to always denote the same > thing > >> in all RDF datasets, so that when we merge RDF datasets, the merge will > >> make sense: the merge will be consistent and an application that worked > >> properly on an individual RDF dataset will also work properly on the merge > >> of that dataset with other datasets. But because URI definitions are > >> inherently ambiguous, different RDF authors will interpret them > >> differently, and this leads to inconsistencies when datasets are merged -- > >> even when all parties have acted in good faith and have done all that they > >> could reasonably have been expected to do to avoid such conflicts. > >> > >> Key assumptions: > >> > >> 1. Owen's URI definition will always be ambiguous. There will always > >> exist a property p such that neither p nor its negation are entailed by the > >> URI definition. > >> > >> 2. Owen cannot be expected to forever refine his URI definition by > adding > >> disambiguation at the request of every RDF author who uses his URIs. At > >> some point, Owen will reach the point of saying "that's all the > >> disambiguation you get". (This is the point at which the example that I > >> gave begins.) > >> > >> > >> > >>> > >>>> Unless some way around this dilemma is found, it seems unreasonably > >>>> judgemental to accuse Arthur of misusing owl:sameAs in this case, > >>>> > >>> > >>> Possibly, yes, but not because... > >>> > >>> since he didn't assert anything that was inconsistent with Owen's > >>>> URI definition > >>>> > >>> > >>> Consistency is not the point. If I make completely unfounded > >>> assertions about a topic that you have introduced, then the fact they > >>> might be logically consistent with what you have said is neither here > >>> nor there. What matters is whether I have the authority to make the > >>> assertions I do, or whether I am lying, fabricating or simply > >>> fantasizing using Owen's vocabulary. > >>> > >> > >> Can you translate that into more objective technical terms? What exactly > >> does "unfounded" mean? And what do you mean by "authority"? What > objective > >> technical criteria are you suggesting? And why is it relevant to the > >> question of whether Arthur misused owl:sameAs, given that the RDF > Semantics > >> is explicitly agnostic about interpretations? > >> > >> David Booth > >> > >> > > -- > Phillip Lord, Phone: +44 (0) 191 222 7827 > Lecturer in Bioinformatics, Email: phillip.lord@newcastle.ac.uk > School of Computing Science, > http://homepages.cs.ncl.ac.uk/phillip.lord > Room 914 Claremont Tower, skype: russet_apples > Newcastle University, twitter: phillord > NE1 7RU
Received on Monday, 8 April 2013 18:06:54 UTC