- From: Phillip Lord <phillip.lord@newcastle.ac.uk>
- Date: Tue, 09 Apr 2013 16:31:42 +0100
- To: Alan Ruttenberg <alanruttenberg@gmail.com>
- Cc: "Bhat\, Talapady N." <talapady.bhat@nist.gov>, Oliver Ruebenacker <curoli@gmail.com>, David Booth <david@dbooth.org>, Pat Hayes <phayes@ihmc.us>, Peter Ansell <ansell.peter@gmail.com>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
Compare all you like. RDF is just another technology; it's not going to let me do anything that I cannot do in another way. I'm interested in using it because it is there, not for any other reason. The surface syntax problem; yeah, it is and remains a pain, more some in some areas than others. Phil Alan Ruttenberg <alanruttenberg@gmail.com> writes: > Thinking about "metadata" as some other category of data is usually a bad > sign. I've often found it to mean, in practice, "data I care less about". > > Phil, to make the case that RDF helps here, we would want to compare how > easy it is to do significant work using the ill-represented examples you > find versus raw text, versus xml, versus tab-delimited files. While there > is some limited benefit to getting rid of the surface syntax problem, it's > not clear how much of a problem that ever was. > > -Alan > > > On Mon, Apr 8, 2013 at 1:16 PM, Bhat, Talapady N. <talapady.bhat@nist.gov>wrote: > >> Hi, >> ----- >> >> Introduction -Dublin Core: >> The Dublin Core Metadata Element Set is a vocabulary of fifteen properties >> for use in resource description. The name "Dublin" is due to its origin at >> a 1995 invitational workshop in Dublin, Ohio; "core" because its elements >> are broad and generic, usable for describing a wide range of resources. >> >> The fifteen element "Dublin Core" described in this standard is part of a >> larger set of metadata vocabularies >> -------------------------------------- >> As per the introduction (given above) section of doubling core ( >> http://dublincore.org/documents/dces/) its focus is primarily metadata >> whereas the actual author names mentioned below probably need be considered >> as 'data'. I do not think Dublin core has really focused on building >> standard re-usable vocabulary for 'data'. That is the real problem. That is >> why we have been focusing on re-usable terms for 'data' >> >> http://www.biomedcentral.com/1471-2105/12/487 and >> http://xpdb.nist.gov/chemblast/pdb.pl and >> http://www.nature.com/nmeth/journal/v9/n7/abs/nmeth.2084.html >> >> T N Bhat >> >> -----Original Message----- >> From: Phillip Lord [mailto:phillip.lord@newcastle.ac.uk] >> Sent: Monday, April 08, 2013 12:53 PM >> To: Oliver Ruebenacker >> Cc: David Booth; Pat Hayes; Peter Ansell; Alan Ruttenberg; >> public-semweb-lifesci >> Subject: Re: owl:sameAs - Harmful to provenance? >> >> >> And it is this bit -- "before we can do anything useful" that is utterly >> wrong. >> >> Recently I have spent a lot of time look at Dublin Core creator fields. >> You could not believe how many different ways they are used. String >> literals ("Phillip Lord"), last-first ("Lord, Phillip"), with abbrevs ("P. >> Lord"), multi-author ("Phillip Lord; Lindsay Marshall"), with titles ("Dr >> Phillip Lord") and so on. >> >> So, is everyone using Dublin Core wrong? It is useless till everyone uses >> it the same way? Emphatically no, it is not useless. >> >> Would it better if everybody did use it the same way? The answer is >> probably not. Names are incredibly complex, and representing them is, in >> turn, difficult and hard. Any specificiation which did full justice to all >> the different name forms in existance would be incredibly long-winded. Many >> people using the specification would get it wrong; or you could have a >> mechanism for ensuring people always used it correctly. >> Then I am sure that both people who ended up using this form of spec would >> have great fun integrating their tiny datasets. >> >> In the example, we have a number of sets of assertions which individually >> fulfil their creators use-cases. Then, when they are bought together, the >> assertions become inconsistent, telling you up front that there is work to >> be done. And you ask in what way is this useful? >> >> Perfection is the enemy of Good. >> >> >> >> Oliver Ruebenacker <curoli@gmail.com> writes: >> > So what most people here are saying is that before we can do >> > anything useful, we need to make sure that if two assertions use the >> > same reference, they mean the same thing. >> > >> > To which you respond that you will accept assertions without >> > assuming that same references mean same things. You will just keep them >> separate. >> > There is no rule against that. >> > >> > But in what way is this useful? >> > >> > Take care >> > Oliver >> > >> > On Mon, Apr 8, 2013 at 10:07 AM, David Booth <david@dbooth.org> wrote: >> > >> >> Hi Pat, >> >> >> >> >> >> On 04/04/2013 02:03 AM, Pat Hayes wrote: >> >> >> >>> >> >>> On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote: >> >>> >> >>> On 4 April 2013 11:58, David Booth <david@dbooth.org> wrote: On >> >>>> 04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2, >> >>>> 2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote: >> >>>> On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote: >> >>>> >> >>>> If only owl:sameAs were used correctly... >> >>>> >> >>>> Well, I agree that is a problem, but don't draw the conclusion that >> >>>> there is something wrong with sameAs, just because people keep >> >>>> using it wrong. >> >>>> >> >>>> Agreed. And furthermore, don't draw the conclusion that someone >> >>>> has used owl:sameAs wrong just because you get garbage when you >> >>>> merge two graphs that individually worked just fine. Those two >> >>>> graphs may have been written assuming different sets of >> >>>> interpretations. >> >>>> >> >>>> In that case I would certainly conclude that they have used it >> >>>> wrong. Have you not been reading what Pat and I have been writing? >> >>>> >> >>>> I've read lots of what you and Pat have written. And I've learned >> >>>> a lot from it -- particularly in learning about ambiguity from Pat. >> >>>> And I'm in full agreement that owl:sameAs is *often* misused. >> >>>> >> >>>> But I don't believe that getting garbage when merging two graphs >> >>>> that individually worked fine *necessarily* indicates that >> >>>> owl:sameAs was misused -- even when it appears on the surface to be >> >>>> causing the problem. >> >>>> >> >>> >> >>> I agree, but not with your example and your analysis of it. >> >>> >> >>> Here's a simple example to illustrate. >> >>>> >> >>>> Using the following prefixes throughout, for brevity: >> >>>> >> >>>> @prefix : <http://example/owen/> . @prefix owl: >> >>>> <http://www.w3.org/2002/07/**owl# <http://www.w3.org/2002/07/owl#>> . >> >>>> >> >>>> Suppose that Owen is the URI owner of :x, :y and :z, and Owen >> >>>> defines them as follows: >> >>>> >> >>>> # Owen's URI definition for :x, :y and :z :x a :Something . :y a >> >>>> :Something . :z a :Something . >> >>>> >> >>>> That's all. That's Owen's entire definition of those URIs. >> >>>> Obviously this definition is "ambiguous" in some sense. But as we >> >>>> know, ambiguity is ultimately inescapable anyway, so I have merely >> >>>> chosen an example that makes the ambiguity obvious. As the RDF >> >>>> Semantics spec puts it: "It is usually impossible to assert enough >> >>>> in any language to completely constrain the interpretations to a >> >>>> single possible world". >> >>>> >> >>> >> >>> Yes, but by making the ambiguity this "obvious", you have rendered >> >>> the example pointless. There is *no* content here *at all*, so Owen >> >>> has not really published anything. This is not typical of published >> >>> content, even in RDF. Typically, in fact, there is, as well as some >> >>> nontrivial actual RDF content, some kind of explanation, perhaps in >> >>> natural language, of what the *intended* content of the formal RDF >> >>> is supposed to be. While an RDF engine cannot of course make use of >> >>> such intuitive explanations, other authors of RDF can, and should, >> >>> make use of it to try to ensure that they do not make assertions >> >>> which would be counter to the referential intentions of the original >> >>> authors. For example, the Dublin Core URIs were published with >> >>> almost no formal RDF axioms, but quite elaborate natural language >> >>> glosses which enable them to be used in formal RDF with considerable >> success. >> >>> The fact that formal (and even informal) data is inherently >> >>> ambiguous does not mean that it is inherently, or even typically, >> vacuous. >> >>> >> >> >> >> This seems to suggest that natural language can somehow eliminate >> >> ambiguity, where formal languages cannot. I don't buy that. >> >> Presumably whatever definition one expressed in natural language >> >> could be expressed in a formal language -- in principle at least. >> >> And certainly the goal of the semantic web is to have such >> >> information expressed in a formal language that is amenable to machine >> processing. >> >> >> >> More precisely, the basic assumption I am making is that for (almost) >> >> any definition there exists a property such that neither that >> >> property nor its negation are entailed by the definition. I.e., >> >> there is always more than can be said about the thing whose identity >> >> is defined. Maybe that assumption is wrong; I don't know. If you >> >> think it's wrong, I'd be interested in hearing why. >> >> >> >> The example may not be "realistic", but it is *not* pointless. The >> >> whole point of choosing such a simple example is to expose the >> >> fundamental issues outright, rather than obscuring them in complexity >> >> that we cannot fully understand. If there is some fundamental reason >> >> why you think this problem cannot happen in a more "realistic" >> >> example, then please explain what mechanism would come into play to >> prevent it. >> >> >> >> >> >> >> >>> Arthur, an RDF author, publishes the following graph, G1, making >> >>>> certain assumptions about the interpretations that will be applied >> >>>> to it: >> >>>> >> >>>> # G1 :x owl:sameAs :y . >> >>>> >> >>> >> >>> On what basis does Arthur make this assertion? The URIs were coined >> >>> by Owen, and Owen says nothing that would sanction this assumption. >> >>> >> >> >> >> Why Arthur or anyone else chooses to assert whatever they choose to >> >> assert is their business. It is irrelevant to this analysis. >> >> >> >> >> >> >> >>> Aster, another RDF author, publishes the following graph, G2, >> >>>> making certain other assumptions about the interpretations that >> >>>> will be applied to it: >> >>>> >> >>>> # G2 :x owl:differentFrom :z . >> >>>> >> >>>> Alfred, a third RDF author, publishes the following graph, G3, >> >>>> making still other assumptions about the interpretations that will >> >>>> be applied to it: >> >>>> >> >>>> # G3 :y owl:differentFrom :z . >> >>>> >> >>> >> >>> Similarly for the other two. They are making assertions using names >> >>> that belong to, and were coined by, another author without having >> >>> any possible source of justification for these nontrivial claims. >> >>> This should not be regarded as good practice, to put it mildly. >> >>> >> >> >> >> Ditto. If you are claiming that an RDF author needs some sort of >> >> "justification" to make assertions, then please explain exactly what >> >> you mean -- preferably in formal terms -- by "justification". E.g., >> >> does "justification" mean that Arthur may only make assertions that >> >> are entailed by Owen's definition? I already discussed that >> possibility below. >> >> >> >> >> >> >> >>> Note that G1, G2 and G3 are all individually consistent with Owen's >> >>>> URI definition. Furthermore, G1, G2 and G3 are all pair-wise >> >>>> consistent: there exists at least one satisfying interpretation for >> >>>> the merge of each pair. But the merge of G1, G2 and G3 is not >> >>>> consistent: >> >>>> >> >>> >> >>> This kind of behavior is of course quite typical in any assertional >> >>> language. >> >>> >> >> >> >> Yes. >> >> >> >> >> >> >> >>> Arthur, Aster and Alfred made different assumptions about the set >> >>>> of interpretations that would be applied to their graphs, and the >> >>>> intersection of those sets was empty. >> >>>> >> >>>> Did Arthur misuse owl:sameAs? What if Aster never published G2? >> >>>> How could Aster's graph possibly affect the question of whether >> >>>> *Arthur* misused owl:sameAs? It would be nonsensical to assume >> >>>> that it could. >> >>>> >> >>> >> >>> Why? Surely if Aster had a more reliable access to the primary >> >>> source of information about these enigmatic thingies than Arthur >> >>> did, then it might well be the case that Aster's publication could >> >>> reveal errors in Arthur's, by contradicting him. >> >>> >> >> >> >> What do you mean by "more reliable"? Both Arthur and Aster had >> >> access to the exact same URI definition from Owen. Are you >> >> suggesting that Arthur and/or Aster should have used a *different* >> >> URI definition? If so, what definition and why? >> >> >> >> >> >>> What if Owen later said that Arthur was correct, that :x == :y ? >> >>>> What if he later said the opposite? Again, it would seem rather >> >>>> bizarre to say that the determination of whether Arthur had misused >> >>>> owl:sameAs could be changed -- long after Arthur had written G1 -- >> >>>> by Owen's later statements. >> >>>> >> >>> >> >>> Again, I don't find this bizarre in the least. It might be, if there >> >>> was no truth of the matter concerning all this stuff, so that all >> >>> >> >>> these assertions were made independently with equal (or equal lack >> >>> of) authority as to their actual truth. But that is so implausible >> >>> and artificial an assumption that I don't see why we need to even >> >>> discuss it. >> >>> >> >> >> >> The RDF Semantics is explicitly agnostic about interpretations and >> >> "actual truth". >> >> >> >> Owen published a URI definition, and Arthur, Aster and Alfred >> >> published a bunch of assertions. Whether anyone "believes" any of >> >> those assertions, whether those assertions have any bearing on the >> >> "real world", and whether they are at all useful to anyone's >> >> applications, are entirely different questions. AFAICT those >> >> questions are irrelevant to the technical question of whether Arthur >> "misused" owl:sameAs. >> >> >> >> >> >> >> >>> One might claim that Arthur misused owl:sameAs because Owen had not >> >>>> specified whether :x was the same or different from :y or :z, and >> >>>> therefore Arthur had improperly *guessed* about the value of :x's >> >>>> owl:sameAs property. >> >>>> >> >>>> But by that logic, Arthur would not be able to assert *anything* >> >>>> new about :x. I.e., Arthur would not be allowed to assert any >> >>>> property whose value was not already entailed by Owen's definition! >> >>>> >> >>> >> >>> Arthur may add information, of course. But Arthur is responsible for >> >>> the truth of what he asserts, and part of that responsibility, in >> >>> practice, is to take care to ascertain what the intended referents >> >>> are of any URIs published by others, that Arthur then uses in his >> >>> assertions. >> >>> >> >> >> >> But Arthur, Aster and Alfred were each fully diligent in ensuring >> >> that their assertions were consistent with all information that Owen >> provided. >> >> What more could they do? >> >> >> >> >> >> For example, if I (as I recently did) wish to assert that >> >>> something was red in color, I might use the URI >> >>> >> >>> http://linkedopencolors.**moreways.net/color/rgb/ff0000.**html<http: >> >>> //linkedopencolors.moreways.net/color/rgb/ff0000.html> >> >>> >> >>> rather than, say, >> >>> >> >>> http://linkedopencolors.**moreways.net/color/rgb/00ff00.**html<http: >> >>> //linkedopencolors.moreways.net/color/rgb/00ff00.html> >> >>> >> >>> because I know, using my color vision (not available to RDF engines) >> >>> that the first one refers to red and the second one to green, which >> >>> (I also know) is not red. I *could* use the second URI and insist >> >>> that I intended it to denote the color red, but that would be >> >>> stupid, since readers of my RDF will (and indeed should) >> >>> misunderstand me. If I were to assert that >> >>> >> >>> http://linkedopencolors.**moreways.net/color/rgb/00ff00.**html<http: >> >>> //linkedopencolors.moreways.net/color/rgb/00ff00.html> >> >>> owl:sameAs >> >>> http://linkedopencolors.**moreways.net/color/css/red.**html<http://l >> >>> inkedopencolors.moreways.net/color/css/red.html> >> >>> . >> >>> >> >>> then I would be saying something false. And yes, in that case, it >> >>> *is* my error, even if what I have said is formally consistent >> >>> (which it in fact is) with the published RDF "definition" of these >> >>> URis (which is in fact empty.) >> >>> >> >> >> >> In that example there were additional constraints that were not >> >> expressed formally -- such as the fact that red and green are >> >> different colors, and what wavelengths correspond to which colors, >> >> etc. But unless you are claiming that assertions expressed in >> >> natural language can somehow avoid ambiguity where formal assertions >> >> cannot, then for the sake of analysis we can assume that all assertions >> have been expressed formally. >> >> >> >> I am also assuming that in the vast majority of cases, a URI's >> >> resource identity will be defined by a description, rather than by >> >> ostension >> >> http://plato.stanford.edu/**entries/identity/<http://plato.stanford.e >> >> du/entries/identity/> >> >> so I am focusing on that case. >> >> >> >> >> >> >> >>> And that would render RDF rather pointless. >> >>>> >> >>> >> >>> Why would it render it pointless? The point of RDF is not to make >> >>> completely unjustified statements about nothing in particular. >> >>> >> >> >> >> RDF is designed to allow anyone to say anything about anything. If >> >> someone chooses to make completely unjustified statements about >> >> nothing in particular, that is their business. AFAICT that is >> >> completely irrelevant to the technical question of whether owl:sameAs >> was used incorrectly. >> >> >> >> >> >> >> >>> Maybe someone can see a way to avoid this dilemma. Maybe someone >> >>>> can figure out a way to distinguish between the "essential" >> >>>> properties that serve to identify a resource, and other >> >>>> "inessential" properties that the resource might have. If so, and >> >>>> the number of "essential" properties is finite, then indeed this >> >>>> problem could be avoided by requiring every URI owner to define all >> >>>> of the "essential" properties of the URI's denoted resource, or by >> >>>> prohibiting anyone but the URI owner from asserting any new >> >>>> "essential" properties of the resource (beyond those the URI owner >> >>>> had defined). Or maybe there is another way around this dilemma. >> >>>> >> >>> >> >>> What do you see the "dilemma" here as being, exactly? It seems to me >> >>> that this is not about RDF as such at all. It is about data, however >> >>> that data is recorded. People can publish data about things. They do >> >>> so by making assertions. In an ideal world, everyone is responsible >> >>> for the assertions they make. Other people can put together >> >>> information from various sources, but the reliability of the result >> >>> is hostage to the reliability of all the sources that are used. All >> >>> this is kind of obvious, but what else is being said in this thread? >> >>> >> >> >> >> The dilemma is that we would like each URI to always denote the same >> >> thing in all RDF datasets, so that when we merge RDF datasets, the >> >> merge will make sense: the merge will be consistent and an >> >> application that worked properly on an individual RDF dataset will >> >> also work properly on the merge of that dataset with other datasets. >> >> But because URI definitions are inherently ambiguous, different RDF >> >> authors will interpret them differently, and this leads to >> >> inconsistencies when datasets are merged -- even when all parties >> >> have acted in good faith and have done all that they could reasonably >> have been expected to do to avoid such conflicts. >> >> >> >> Key assumptions: >> >> >> >> 1. Owen's URI definition will always be ambiguous. There will >> >> always exist a property p such that neither p nor its negation are >> >> entailed by the URI definition. >> >> >> >> 2. Owen cannot be expected to forever refine his URI definition by >> >> adding disambiguation at the request of every RDF author who uses his >> >> URIs. At some point, Owen will reach the point of saying "that's all >> >> the disambiguation you get". (This is the point at which the example >> >> that I gave begins.) >> >> >> >> >> >> >> >>> >> >>>> Unless some way around this dilemma is found, it seems unreasonably >> >>>> judgemental to accuse Arthur of misusing owl:sameAs in this case, >> >>>> >> >>> >> >>> Possibly, yes, but not because... >> >>> >> >>> since he didn't assert anything that was inconsistent with Owen's >> >>>> URI definition >> >>>> >> >>> >> >>> Consistency is not the point. If I make completely unfounded >> >>> assertions about a topic that you have introduced, then the fact >> >>> they might be logically consistent with what you have said is >> >>> neither here nor there. What matters is whether I have the authority >> >>> to make the assertions I do, or whether I am lying, fabricating or >> >>> simply fantasizing using Owen's vocabulary. >> >>> >> >> >> >> Can you translate that into more objective technical terms? What >> >> exactly does "unfounded" mean? And what do you mean by "authority"? >> >> What objective technical criteria are you suggesting? And why is it >> >> relevant to the question of whether Arthur misused owl:sameAs, given >> >> that the RDF Semantics is explicitly agnostic about interpretations? >> >> >> >> David Booth >> >> >> >> >> >> -- >> Phillip Lord, Phone: +44 (0) 191 222 7827 >> Lecturer in Bioinformatics, Email: >> phillip.lord@newcastle.ac.uk >> School of Computing Science, >> http://homepages.cs.ncl.ac.uk/phillip.lord >> Room 914 Claremont Tower, skype: russet_apples >> Newcastle University, twitter: phillord >> NE1 7RU >> >> -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.lord@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU
Received on Tuesday, 9 April 2013 15:32:13 UTC