- From: Alan Ruttenberg <alanruttenberg@gmail.com>
- Date: Mon, 8 Apr 2013 13:23:59 -0400
- To: "Bhat, Talapady N." <talapady.bhat@nist.gov>
- Cc: Phillip Lord <phillip.lord@newcastle.ac.uk>, Oliver Ruebenacker <curoli@gmail.com>, David Booth <david@dbooth.org>, Pat Hayes <phayes@ihmc.us>, Peter Ansell <ansell.peter@gmail.com>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
- Message-ID: <CAFKQJ8=wvu8R-mMESs0zyA+kwehnwwb-VLVbE659926wWR94Yw@mail.gmail.com>
Nicely pointed out, TN. Thinking about "metadata" as some other category of data is usually a bad sign. I've often found it to mean, in practice, "data I care less about". Phil, to make the case that RDF helps here, we would want to compare how easy it is to do significant work using the ill-represented examples you find versus raw text, versus xml, versus tab-delimited files. While there is some limited benefit to getting rid of the surface syntax problem, it's not clear how much of a problem that ever was. -Alan On Mon, Apr 8, 2013 at 1:16 PM, Bhat, Talapady N. <talapady.bhat@nist.gov>wrote: > Hi, > ----- > > Introduction -Dublin Core: > The Dublin Core Metadata Element Set is a vocabulary of fifteen properties > for use in resource description. The name "Dublin" is due to its origin at > a 1995 invitational workshop in Dublin, Ohio; "core" because its elements > are broad and generic, usable for describing a wide range of resources. > > The fifteen element "Dublin Core" described in this standard is part of a > larger set of metadata vocabularies > -------------------------------------- > As per the introduction (given above) section of doubling core ( > http://dublincore.org/documents/dces/) its focus is primarily metadata > whereas the actual author names mentioned below probably need be considered > as 'data'. I do not think Dublin core has really focused on building > standard re-usable vocabulary for 'data'. That is the real problem. That is > why we have been focusing on re-usable terms for 'data' > > http://www.biomedcentral.com/1471-2105/12/487 and > http://xpdb.nist.gov/chemblast/pdb.pl and > http://www.nature.com/nmeth/journal/v9/n7/abs/nmeth.2084.html > > T N Bhat > > -----Original Message----- > From: Phillip Lord [mailto:phillip.lord@newcastle.ac.uk] > Sent: Monday, April 08, 2013 12:53 PM > To: Oliver Ruebenacker > Cc: David Booth; Pat Hayes; Peter Ansell; Alan Ruttenberg; > public-semweb-lifesci > Subject: Re: owl:sameAs - Harmful to provenance? > > > And it is this bit -- "before we can do anything useful" that is utterly > wrong. > > Recently I have spent a lot of time look at Dublin Core creator fields. > You could not believe how many different ways they are used. String > literals ("Phillip Lord"), last-first ("Lord, Phillip"), with abbrevs ("P. > Lord"), multi-author ("Phillip Lord; Lindsay Marshall"), with titles ("Dr > Phillip Lord") and so on. > > So, is everyone using Dublin Core wrong? It is useless till everyone uses > it the same way? Emphatically no, it is not useless. > > Would it better if everybody did use it the same way? The answer is > probably not. Names are incredibly complex, and representing them is, in > turn, difficult and hard. Any specificiation which did full justice to all > the different name forms in existance would be incredibly long-winded. Many > people using the specification would get it wrong; or you could have a > mechanism for ensuring people always used it correctly. > Then I am sure that both people who ended up using this form of spec would > have great fun integrating their tiny datasets. > > In the example, we have a number of sets of assertions which individually > fulfil their creators use-cases. Then, when they are bought together, the > assertions become inconsistent, telling you up front that there is work to > be done. And you ask in what way is this useful? > > Perfection is the enemy of Good. > > > > Oliver Ruebenacker <curoli@gmail.com> writes: > > So what most people here are saying is that before we can do > > anything useful, we need to make sure that if two assertions use the > > same reference, they mean the same thing. > > > > To which you respond that you will accept assertions without > > assuming that same references mean same things. You will just keep them > separate. > > There is no rule against that. > > > > But in what way is this useful? > > > > Take care > > Oliver > > > > On Mon, Apr 8, 2013 at 10:07 AM, David Booth <david@dbooth.org> wrote: > > > >> Hi Pat, > >> > >> > >> On 04/04/2013 02:03 AM, Pat Hayes wrote: > >> > >>> > >>> On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote: > >>> > >>> On 4 April 2013 11:58, David Booth <david@dbooth.org> wrote: On > >>>> 04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2, > >>>> 2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote: > >>>> On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote: > >>>> > >>>> If only owl:sameAs were used correctly... > >>>> > >>>> Well, I agree that is a problem, but don't draw the conclusion that > >>>> there is something wrong with sameAs, just because people keep > >>>> using it wrong. > >>>> > >>>> Agreed. And furthermore, don't draw the conclusion that someone > >>>> has used owl:sameAs wrong just because you get garbage when you > >>>> merge two graphs that individually worked just fine. Those two > >>>> graphs may have been written assuming different sets of > >>>> interpretations. > >>>> > >>>> In that case I would certainly conclude that they have used it > >>>> wrong. Have you not been reading what Pat and I have been writing? > >>>> > >>>> I've read lots of what you and Pat have written. And I've learned > >>>> a lot from it -- particularly in learning about ambiguity from Pat. > >>>> And I'm in full agreement that owl:sameAs is *often* misused. > >>>> > >>>> But I don't believe that getting garbage when merging two graphs > >>>> that individually worked fine *necessarily* indicates that > >>>> owl:sameAs was misused -- even when it appears on the surface to be > >>>> causing the problem. > >>>> > >>> > >>> I agree, but not with your example and your analysis of it. > >>> > >>> Here's a simple example to illustrate. > >>>> > >>>> Using the following prefixes throughout, for brevity: > >>>> > >>>> @prefix : <http://example/owen/> . @prefix owl: > >>>> <http://www.w3.org/2002/07/**owl# <http://www.w3.org/2002/07/owl#>> . > >>>> > >>>> Suppose that Owen is the URI owner of :x, :y and :z, and Owen > >>>> defines them as follows: > >>>> > >>>> # Owen's URI definition for :x, :y and :z :x a :Something . :y a > >>>> :Something . :z a :Something . > >>>> > >>>> That's all. That's Owen's entire definition of those URIs. > >>>> Obviously this definition is "ambiguous" in some sense. But as we > >>>> know, ambiguity is ultimately inescapable anyway, so I have merely > >>>> chosen an example that makes the ambiguity obvious. As the RDF > >>>> Semantics spec puts it: "It is usually impossible to assert enough > >>>> in any language to completely constrain the interpretations to a > >>>> single possible world". > >>>> > >>> > >>> Yes, but by making the ambiguity this "obvious", you have rendered > >>> the example pointless. There is *no* content here *at all*, so Owen > >>> has not really published anything. This is not typical of published > >>> content, even in RDF. Typically, in fact, there is, as well as some > >>> nontrivial actual RDF content, some kind of explanation, perhaps in > >>> natural language, of what the *intended* content of the formal RDF > >>> is supposed to be. While an RDF engine cannot of course make use of > >>> such intuitive explanations, other authors of RDF can, and should, > >>> make use of it to try to ensure that they do not make assertions > >>> which would be counter to the referential intentions of the original > >>> authors. For example, the Dublin Core URIs were published with > >>> almost no formal RDF axioms, but quite elaborate natural language > >>> glosses which enable them to be used in formal RDF with considerable > success. > >>> The fact that formal (and even informal) data is inherently > >>> ambiguous does not mean that it is inherently, or even typically, > vacuous. > >>> > >> > >> This seems to suggest that natural language can somehow eliminate > >> ambiguity, where formal languages cannot. I don't buy that. > >> Presumably whatever definition one expressed in natural language > >> could be expressed in a formal language -- in principle at least. > >> And certainly the goal of the semantic web is to have such > >> information expressed in a formal language that is amenable to machine > processing. > >> > >> More precisely, the basic assumption I am making is that for (almost) > >> any definition there exists a property such that neither that > >> property nor its negation are entailed by the definition. I.e., > >> there is always more than can be said about the thing whose identity > >> is defined. Maybe that assumption is wrong; I don't know. If you > >> think it's wrong, I'd be interested in hearing why. > >> > >> The example may not be "realistic", but it is *not* pointless. The > >> whole point of choosing such a simple example is to expose the > >> fundamental issues outright, rather than obscuring them in complexity > >> that we cannot fully understand. If there is some fundamental reason > >> why you think this problem cannot happen in a more "realistic" > >> example, then please explain what mechanism would come into play to > prevent it. > >> > >> > >> > >>> Arthur, an RDF author, publishes the following graph, G1, making > >>>> certain assumptions about the interpretations that will be applied > >>>> to it: > >>>> > >>>> # G1 :x owl:sameAs :y . > >>>> > >>> > >>> On what basis does Arthur make this assertion? The URIs were coined > >>> by Owen, and Owen says nothing that would sanction this assumption. > >>> > >> > >> Why Arthur or anyone else chooses to assert whatever they choose to > >> assert is their business. It is irrelevant to this analysis. > >> > >> > >> > >>> Aster, another RDF author, publishes the following graph, G2, > >>>> making certain other assumptions about the interpretations that > >>>> will be applied to it: > >>>> > >>>> # G2 :x owl:differentFrom :z . > >>>> > >>>> Alfred, a third RDF author, publishes the following graph, G3, > >>>> making still other assumptions about the interpretations that will > >>>> be applied to it: > >>>> > >>>> # G3 :y owl:differentFrom :z . > >>>> > >>> > >>> Similarly for the other two. They are making assertions using names > >>> that belong to, and were coined by, another author without having > >>> any possible source of justification for these nontrivial claims. > >>> This should not be regarded as good practice, to put it mildly. > >>> > >> > >> Ditto. If you are claiming that an RDF author needs some sort of > >> "justification" to make assertions, then please explain exactly what > >> you mean -- preferably in formal terms -- by "justification". E.g., > >> does "justification" mean that Arthur may only make assertions that > >> are entailed by Owen's definition? I already discussed that > possibility below. > >> > >> > >> > >>> Note that G1, G2 and G3 are all individually consistent with Owen's > >>>> URI definition. Furthermore, G1, G2 and G3 are all pair-wise > >>>> consistent: there exists at least one satisfying interpretation for > >>>> the merge of each pair. But the merge of G1, G2 and G3 is not > >>>> consistent: > >>>> > >>> > >>> This kind of behavior is of course quite typical in any assertional > >>> language. > >>> > >> > >> Yes. > >> > >> > >> > >>> Arthur, Aster and Alfred made different assumptions about the set > >>>> of interpretations that would be applied to their graphs, and the > >>>> intersection of those sets was empty. > >>>> > >>>> Did Arthur misuse owl:sameAs? What if Aster never published G2? > >>>> How could Aster's graph possibly affect the question of whether > >>>> *Arthur* misused owl:sameAs? It would be nonsensical to assume > >>>> that it could. > >>>> > >>> > >>> Why? Surely if Aster had a more reliable access to the primary > >>> source of information about these enigmatic thingies than Arthur > >>> did, then it might well be the case that Aster's publication could > >>> reveal errors in Arthur's, by contradicting him. > >>> > >> > >> What do you mean by "more reliable"? Both Arthur and Aster had > >> access to the exact same URI definition from Owen. Are you > >> suggesting that Arthur and/or Aster should have used a *different* > >> URI definition? If so, what definition and why? > >> > >> > >>> What if Owen later said that Arthur was correct, that :x == :y ? > >>>> What if he later said the opposite? Again, it would seem rather > >>>> bizarre to say that the determination of whether Arthur had misused > >>>> owl:sameAs could be changed -- long after Arthur had written G1 -- > >>>> by Owen's later statements. > >>>> > >>> > >>> Again, I don't find this bizarre in the least. It might be, if there > >>> was no truth of the matter concerning all this stuff, so that all > >>> > >>> these assertions were made independently with equal (or equal lack > >>> of) authority as to their actual truth. But that is so implausible > >>> and artificial an assumption that I don't see why we need to even > >>> discuss it. > >>> > >> > >> The RDF Semantics is explicitly agnostic about interpretations and > >> "actual truth". > >> > >> Owen published a URI definition, and Arthur, Aster and Alfred > >> published a bunch of assertions. Whether anyone "believes" any of > >> those assertions, whether those assertions have any bearing on the > >> "real world", and whether they are at all useful to anyone's > >> applications, are entirely different questions. AFAICT those > >> questions are irrelevant to the technical question of whether Arthur > "misused" owl:sameAs. > >> > >> > >> > >>> One might claim that Arthur misused owl:sameAs because Owen had not > >>>> specified whether :x was the same or different from :y or :z, and > >>>> therefore Arthur had improperly *guessed* about the value of :x's > >>>> owl:sameAs property. > >>>> > >>>> But by that logic, Arthur would not be able to assert *anything* > >>>> new about :x. I.e., Arthur would not be allowed to assert any > >>>> property whose value was not already entailed by Owen's definition! > >>>> > >>> > >>> Arthur may add information, of course. But Arthur is responsible for > >>> the truth of what he asserts, and part of that responsibility, in > >>> practice, is to take care to ascertain what the intended referents > >>> are of any URIs published by others, that Arthur then uses in his > >>> assertions. > >>> > >> > >> But Arthur, Aster and Alfred were each fully diligent in ensuring > >> that their assertions were consistent with all information that Owen > provided. > >> What more could they do? > >> > >> > >> For example, if I (as I recently did) wish to assert that > >>> something was red in color, I might use the URI > >>> > >>> http://linkedopencolors.**moreways.net/color/rgb/ff0000.**html<http: > >>> //linkedopencolors.moreways.net/color/rgb/ff0000.html> > >>> > >>> rather than, say, > >>> > >>> http://linkedopencolors.**moreways.net/color/rgb/00ff00.**html<http: > >>> //linkedopencolors.moreways.net/color/rgb/00ff00.html> > >>> > >>> because I know, using my color vision (not available to RDF engines) > >>> that the first one refers to red and the second one to green, which > >>> (I also know) is not red. I *could* use the second URI and insist > >>> that I intended it to denote the color red, but that would be > >>> stupid, since readers of my RDF will (and indeed should) > >>> misunderstand me. If I were to assert that > >>> > >>> http://linkedopencolors.**moreways.net/color/rgb/00ff00.**html<http: > >>> //linkedopencolors.moreways.net/color/rgb/00ff00.html> > >>> owl:sameAs > >>> http://linkedopencolors.**moreways.net/color/css/red.**html<http://l > >>> inkedopencolors.moreways.net/color/css/red.html> > >>> . > >>> > >>> then I would be saying something false. And yes, in that case, it > >>> *is* my error, even if what I have said is formally consistent > >>> (which it in fact is) with the published RDF "definition" of these > >>> URis (which is in fact empty.) > >>> > >> > >> In that example there were additional constraints that were not > >> expressed formally -- such as the fact that red and green are > >> different colors, and what wavelengths correspond to which colors, > >> etc. But unless you are claiming that assertions expressed in > >> natural language can somehow avoid ambiguity where formal assertions > >> cannot, then for the sake of analysis we can assume that all assertions > have been expressed formally. > >> > >> I am also assuming that in the vast majority of cases, a URI's > >> resource identity will be defined by a description, rather than by > >> ostension > >> http://plato.stanford.edu/**entries/identity/<http://plato.stanford.e > >> du/entries/identity/> > >> so I am focusing on that case. > >> > >> > >> > >>> And that would render RDF rather pointless. > >>>> > >>> > >>> Why would it render it pointless? The point of RDF is not to make > >>> completely unjustified statements about nothing in particular. > >>> > >> > >> RDF is designed to allow anyone to say anything about anything. If > >> someone chooses to make completely unjustified statements about > >> nothing in particular, that is their business. AFAICT that is > >> completely irrelevant to the technical question of whether owl:sameAs > was used incorrectly. > >> > >> > >> > >>> Maybe someone can see a way to avoid this dilemma. Maybe someone > >>>> can figure out a way to distinguish between the "essential" > >>>> properties that serve to identify a resource, and other > >>>> "inessential" properties that the resource might have. If so, and > >>>> the number of "essential" properties is finite, then indeed this > >>>> problem could be avoided by requiring every URI owner to define all > >>>> of the "essential" properties of the URI's denoted resource, or by > >>>> prohibiting anyone but the URI owner from asserting any new > >>>> "essential" properties of the resource (beyond those the URI owner > >>>> had defined). Or maybe there is another way around this dilemma. > >>>> > >>> > >>> What do you see the "dilemma" here as being, exactly? It seems to me > >>> that this is not about RDF as such at all. It is about data, however > >>> that data is recorded. People can publish data about things. They do > >>> so by making assertions. In an ideal world, everyone is responsible > >>> for the assertions they make. Other people can put together > >>> information from various sources, but the reliability of the result > >>> is hostage to the reliability of all the sources that are used. All > >>> this is kind of obvious, but what else is being said in this thread? > >>> > >> > >> The dilemma is that we would like each URI to always denote the same > >> thing in all RDF datasets, so that when we merge RDF datasets, the > >> merge will make sense: the merge will be consistent and an > >> application that worked properly on an individual RDF dataset will > >> also work properly on the merge of that dataset with other datasets. > >> But because URI definitions are inherently ambiguous, different RDF > >> authors will interpret them differently, and this leads to > >> inconsistencies when datasets are merged -- even when all parties > >> have acted in good faith and have done all that they could reasonably > have been expected to do to avoid such conflicts. > >> > >> Key assumptions: > >> > >> 1. Owen's URI definition will always be ambiguous. There will > >> always exist a property p such that neither p nor its negation are > >> entailed by the URI definition. > >> > >> 2. Owen cannot be expected to forever refine his URI definition by > >> adding disambiguation at the request of every RDF author who uses his > >> URIs. At some point, Owen will reach the point of saying "that's all > >> the disambiguation you get". (This is the point at which the example > >> that I gave begins.) > >> > >> > >> > >>> > >>>> Unless some way around this dilemma is found, it seems unreasonably > >>>> judgemental to accuse Arthur of misusing owl:sameAs in this case, > >>>> > >>> > >>> Possibly, yes, but not because... > >>> > >>> since he didn't assert anything that was inconsistent with Owen's > >>>> URI definition > >>>> > >>> > >>> Consistency is not the point. If I make completely unfounded > >>> assertions about a topic that you have introduced, then the fact > >>> they might be logically consistent with what you have said is > >>> neither here nor there. What matters is whether I have the authority > >>> to make the assertions I do, or whether I am lying, fabricating or > >>> simply fantasizing using Owen's vocabulary. > >>> > >> > >> Can you translate that into more objective technical terms? What > >> exactly does "unfounded" mean? And what do you mean by "authority"? > >> What objective technical criteria are you suggesting? And why is it > >> relevant to the question of whether Arthur misused owl:sameAs, given > >> that the RDF Semantics is explicitly agnostic about interpretations? > >> > >> David Booth > >> > >> > > -- > Phillip Lord, Phone: +44 (0) 191 222 7827 > Lecturer in Bioinformatics, Email: > phillip.lord@newcastle.ac.uk > School of Computing Science, > http://homepages.cs.ncl.ac.uk/phillip.lord > Room 914 Claremont Tower, skype: russet_apples > Newcastle University, twitter: phillord > NE1 7RU > >
Received on Monday, 8 April 2013 17:25:05 UTC