- From: <tim.glover@bt.com>
- Date: Mon, 3 Nov 2008 11:27:10 -0000
- To: <graybeal@mbari.org>, <uschold@gmail.com>
- Cc: <semantic-web@w3.org>, <aldo.gangemi@gmail.com>, <cshankey@reinvent.com>, <pmika@yahoo-inc.com>, <ora.lassila@nokia.com>, <jeff.z.pan@abdn.ac.uk>, <timbl@csail.mit.edu>, <Frank.van.Harmelen@cs.vu.nl>, <sean.bechhofer@manchester.ac.uk>
- Message-ID: <AEF15555D64C494CA393778177A3A171054E5EB3@E03MVC1-UKBR.domain1.systemhost.net>
I agree with Michael Lang, who says that the community (or architect) should decide how words are used in an ontology, and should agree on changes. Common sense suggests to me that reason for changing the semantics of a word is to correct an error, in which case the same word should be used, and existing systems will be freed from the error. I cannot think of a good reason for changing the meaning of a word in the context of an ontology otherwise. But I think its important to recognise that in most real systems there are different levels of semantics. - Firstly there are some "keywords" in OWL, whose semantics is defined by W3C and implemented by reasoning engine builders. - Secondly there will be some words that are not defined as part of "OWL" but which are recognised as "keywords" by particular software systems ("if x is a member of VIRAL_INFECTIONS do y"). In these cases the software and the ontology are strongly bound together. - Thirdly there will be aspects of the ontology which are "data driven", in that they are handled in a general way by the software ("find the broader terms of a term"). Fourthly, There may also be a distinction between "A box" and "T box" words. Moreover, a word may be significant to the software in one system, but handled in a "data driven" way by a different system. It seems to me it is not at all clear cut to decide the best way to modify these ontologies. ________________________________ From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On Behalf Of John Graybeal Sent: 03 November 2008 04:25 To: Michael F Uschold Cc: semantic-web@w3.org; aldo.gangemi@gmail.com; Conor Shankey; Peter Mika; Ora Lassila; Pan, Dr Jeff Z.; Tim Berners-Lee; Frank van Harmelen; sean.bechhofer@manchester.ac.uk Subject: Re: URIs and Unique IDs Michael, I will try to be clearer -- your confusion was my fault, sorry. Appreciate very much your comments (and charitable interpretations, n.b. 'strong intuition' :->!). On Nov 1, 2008, at 9:33 AM, Michael F Uschold wrote: Agreed. I assume you mean the vocabulary is the ontology? Are we assuming OWL ontologies here, if not then what do you mean by a vocabulary? yes, I used 'vocabulary' to reflect what our customers have, but an OWL ontology is what we will generate. B. A vocabulary contains all the terms within it, not just the terms that changed in that version Here is my folksy perspective behind the model (more justifications near the end): If I say to a user "Here is a vocabulary dated X", the user will assume that all the terms come with that vocabulary, and the terms of that vocabulary are also dated X. So can I build a working semantic approach that accepts this assumption? So in the SKOS example, the new SKOS vocabulary/ontology would contain the terms that do not change URIs as well as terms with new versions with new URIs. No, sorry, I was sloppy and used 'terms' and 'URIs' interchangeably. Here is the easy part: You can assume that if anything in the specification of a term changes while its string of characters remain the same, I will insist on a new URI for that term (the version string will do nicely to discriminate). And if the string of characters for a term changes, that will be a new URI too. I am using 'term' to mean 'a string of characters that likely, but not necessarily, means something to a human'. So codes and opaque terms are OK. For most ontologies we'll create, terms will be words and word phrases. Anticipating your later comments, we concluded (you won't like this at all): 1. a URI is a suitable UID 2. a term can be part of a suitable URI The first is argued elsewhere by others. Re the second: Since what I really wanted to do was give people a way to say "here's what this string of characters means", it doesn't bother me that the same string of characters may mean something else later -- I need the UID not for the _concept_, but for the unique string of characters. That will always be the string of characters I want that UID to refer to. So making the UID a URL that embeds the string of characters was acceptable. I think I understand the concerns about non-opaque and non-persistent URLs, and believe that those costs are relatively low compared to the resulting early adopter benefits of this approach. [1] This issue arises because of the conflation of URIs, UIDs and human-readable IDs. Until these are de-conflated, probably this principle is the right one. It will be unnecessary after de-conflation. I am unconvinced de-conflation can happen, at least in our lifetimes, which is why I made some of those horrible linkages above. D. It must be possible to 'look up' the current meaning of a term, as well as specifically request any past meanings by their URI I read this that a term like 'broader' in SKOS could have multiple URIs for multiple versions. If this is what you mean, then I absolutely agree with this. Yes, this is what I mean, but keep in mind my previous conflation. Agreed. You seem to be proposing the idea of some kind of object (perhaps with a URI) that corresponds to the core term, and that its various meanings are related versions are linked to the core term. This may be a workable idea. Can this be done with the current semantic web infrastructure? Oh, I sure hope so. (Well, new relationships may be needed. Not an expert here.) We are doing it a shade outside of the 'strict infrastructure', if there is such a thing -- our server will try to be smart about the relationships between vocabulary versions (well, it has to be, to make sure the version relationships are maintained). bb. Any significant definitional or semantic change to a term should really create a new term, not just evolve the word we were already using (what was SKOS thinking?) This is an interesting question with more than one reasonable position. I think there are at least two cases: 1. there was a bonified conceptual error, and everyone agrees that the old meaning was the wrong one and it is not wanted. 2. there is a new alternative, that works in some cases, and some may also wish to use the older versions. For 1. you do NOT want to change the name f the term, was and is the right term. But you DO want to change its UID because it is a different thing. For 2, you probably want to introduce a new term with a new name and a new UID. You could have the name of the transitive version of broader be called broaderT and the non-transitive one be called broader. yes to all the above, well put. You should be able to change the name w/o changing the UID. Well, OK, maybe. Not for my own vocabularies, because those are trying to define strings, not concepts. As you can see, I am hung up on which thing someone has in mind when they say the name -- is it the concept behind the name, or the name itself? I find it a lot easier to consider the name the resource of interest, and if someday my 'inflammable' is redefined to mean flammable, then my ontology will be exactly as wrong as all the books that used the 'old' definition of the word. (At least until I redefine the word. Sure hope everyone is using timestamps. :->) cc. Created relationships to 'most current' URIs persist even as the semantics of that resource may change; this potentially introduces a time quality to inferences done with these resources (e.g., "Today's New York Times has an article on election polls" may be true statement today, but false next week.) Those who choose to use the 'most current' term will get what they pay for. You might be able to have programmatic or infrastructural capability which can return the 'most current' version of a given core term. There might be a URI/UID for the core term, and that is what would be accessed. There, a directive would be given that says please return the the most recent version of that item. This is a promising idea that could probably keep everyone happy. ee. Both the provided service, and ontology engines in general, must be able to relate terms to their semantically identical historical counterparts When every version of every term has its own UID, then this becomes feasible, though it may also be an expensive overhead. ff. The service should be able to quickly identify/present to its users each change in semantic meaning for a term. Yes, and an application should also be able to subscribe to the core UID for a concept to be notified of any changes so it can keep up to date automatically in the case where the most uptodate version is wanted, and otherwise people can look into new versions on a case by case basis. Yes to all the above, and to the 'timestamps may be expensive' also. I am worried about expense, but suspect I won't be able to tell for a while how resource-intensive this will be, and whether optimization will take care of it, and whether I still will be paid to "keep this problem solved." But I plan as if I will... There may be some clear cut cases where you can tell which things are static vs. dynamic. However IMHO, it is likely that a lot (perhaps most) case will be dependent on the needs of the application, and the same concept may be dynamic in some applications and static in others. Maybe. If I declare the static concept is forever unvarying by definition, I don't think it would be strategic for an application to assume otherwise. I am less sanguine about this for predicates -- it seems like you're allowing replacing the engine while the car is running. I don't follow this analogy. I can imagine a future scenario where this is advantageous for predicates, but it seems really inappropriate at this stage. You have a strong intuition that I'm not able to grasp. Can you articulate why with an example? OK, my examples uses 'sea surface temperature' as a subject, and 'sameAs' as a predicate. If, over time, the concept associated with 'sea surface temperature' evolves from "any measurement of any body of sea water within a meter or so of the ocean's surface" to "an informal reference to the concept of temperature near the ocean surface (deprecated as a reference to a particular measurement)", the tools I have written may produce some less-than-ideal inferences if they assume the new definition applies to old data, or vice-versa. Even if the new definition in 100 years is "measurement of the temperature of the foam we keep on top of the ocean to keep it cool", some inferences could be faulty, but the engine won't break down. But if I've originally used 'sameAs' in mappings to mean that two concepts are analogous in certain defined ways (maybe a faulty original practice, but go with it), and then the term is redefined by general consensus to mean "refers to the exact same resource", I have some really broken results, because a key piece right in the middle of my infrastructure has changed. If you try to change important parts in a car while it's moving, bad things can happen, even if the new part is every bit as good as the old part. If we try to change the meaning of core terms used in semantic inferencing, then all the tools and things are likely to behave oddly during the change, if not afterwards as well. As to the multiple URIs for a single concept problem that was introduced in (aa) above, I have both a justification and a backup plan. The justification is that the meaning of terms and their definitions is inferred in a context, and changes to the context (the rest of the vocabulary) can affect the implicit meaning, or usage, of a term that nominally wasn't changed. This is true, and the reason why terms/words in wordnet belong to multiple synsets. Each synset has a unique meaning, and in the owl dataset, each synset has its own URI. So I don't find your argument convincing. Multiple context shows different uses of a term, so each use should get a different UID, not the same one. This is a different context. Example below. So even if I haven't changed the explicit definition of a term in a new vocabulary release, it is meaningful to consider this term a new resource, and give it a new URI, to reflect its new context. Maybe the wordnet example is a read herring. In any event, can you provided a clear example of how an application would find it helpful to have whole new sets of URIs minted for identical things? Here's a simple example, before giving you a detailed domain-specific example: Let's say I review a vocabulary and change 80% of the definitions. But the remaining definitions are deemed good and remain unchanged. By virtue of being part of a heavily reviewed vocabulary, these remaining original terms have gained credibility -- they are more reviewed and more trusted then they were before that version was created. For a domain example, let's go back to sea surface temperature. 5 years ago, it meant something like "any measurement of any body of sea water within a meter or so of the ocean's surface". More recently, data managers realized that wasn't specific enough. So 5 new terms were created to precisely delineate the difference kinds of sea surface temperature. Now, if I get a set of data that uses some of these new terms to label variables, and also has an item labelled 'sea surface temperature', I can infer that the use of the broader variable meant that no more specific description could be provided. Whereas in data from 5 years ago, I might replace the general term in many cases, by looking at other metadata to learn the more specific term. With the existence of the new terms, the old term has new connotations. Here is one example where it is clearly a bad thing. The application is ontology-driven at a deep level. It makes use of the resources in the coding/creation of application functionality. It also loads and makes use of data using the ontology. T1: application loads ontology using original terms. T2: application loads data expressed using the original terms T3: all new URIs are minted, when only a few have changed semantics, and there is no indication of which ones have new semantics and which have the same semantics. Well, this is bad but not unmanageable. Presumably a query of the 'before' and 'after' resources for those two concepts would reveal whether or not there are differences. Or, presumably you can query the ontologies to get that info, even if you can't query the terms themselves. (Hmm, in today's semantic web a lot of times you don't have the original ontology versions either, do you? But that would be another thing that breaks the system to some degree, you don't have any ability to validate previous inferences or see what it was like when the relationships were created, so you can't validate them independently. Sigh....) But in any case, I accept the challenge here and say again "it only works if the new URIs can say whether they are the same semantics as a previous version." Otherwise, I agree it's a bad thing. T4: A new dataset is created which uses the new URIs T5: The application loads the new data T6: The application poses a query which uses the old URIs to filter data. T7; The new URIs do not match the old ones, so the query only returns data from the old URIs when it should return data from the new dataset as well. This is clearly a bad thing. Your proposal has to argue advantages that offset the disadvantage here, in order for me to buy into it. One mitigation of disadvantages is obtained if most of the users map to the 'most recent version' (core concept) of the term, not specific versions. I suspect this will be likely. Of course, it is also very important to say this new resource has the same definition and semantics as another, previous resource, preferably pointing back to the original instance with that definition/semantics. This creates an unnecessary burden and seems to contradict your point that something in a different context will have different semantics. If it has different semantics, then why point back to something with identical semantics? An excellent point. (I'm busted!) Apparently I differentiate between explicit meanings, which one finds in the term's resource description, and implicit meanings, which one finds in larger context. The version relationship primitives have to be understood to refer to the explicit meanings only. When the definition changes explicitly, that's a URI change that no longer can be considered exactly the same concept. I still can't see any advantages for creating multiple copies of exactly the same thing. Have I missed something? The practical advantage is the one introduced at the top -- I can consider and implement the vocabulary as a unit, carrying all of its components along with it. Conceptually/abstractly I suspect this may be the right way to think of a vocabulary. But more practically, this gives me a trivial way to generate URIs for those terms, a trivial way to capture the contents of each new version of the ontology (otherwise I have to analyze every term to decide if it is different, right?), a trivial way to explain to the user what the URI for each term will look like, and a way to tell from the term URI which vocabulary it's a part of (not that I'd ever do that to an opaque URI...). but of course, I realize I have to go do some of these things latert, in any reasonable version of the system...I just don't have to do them *instantly*.... I imagine we will have to create a relationship for our own use that has this meaning for now. We probably will need some new infrastructural primitives, to relate versions to each other. Just so. This is a practical solution which would probably be pretty easy when URIs are de-conflated with UIDs. Though proliferation of URIs for the same thing should be reduced whenever possible. See another thread I started on similar topic by googling ["proliferation of URIs" uschold] Excellent, I looked at the summary post and I see things with your level of concern, perhaps more than the responders. (Though I liked Tim's quote: "So multiple URIs for the same thing is life, a constant tradeoff, but life is, on balance good.") I would be a relatively small scale offender for a while, but a bad example. I will leave it there, too long a post for sure. John [1] Our URI creation scheme is described at http://marinemetadata.org/apguides/ontprovidersguide/ontguideconstructin guris , with other details in that web neighborhood.
Received on Monday, 3 November 2008 11:28:10 UTC