Re: URIs and Unique IDs from John Graybeal on 2008-10-30 (semantic-web@w3.org from October 2008)

From: John Graybeal <graybeal@mbari.org>
Date: Thu, 30 Oct 2008 12:08:59 -0700
To: Michael F Uschold <uschold@gmail.com>
Cc: semantic-web@w3.org, aldo.gangemi@gmail.com, "Conor Shankey" <cshankey@reinvent.com>, "Peter Mika" <pmika@yahoo-inc.com>, "Ora Lassila" <ora.lassila@nokia.com>, "Pan, Dr Jeff Z." <jeff.z.pan@abdn.ac.uk>, "Tim Berners-Lee" <timbl@csail.mit.edu>, "Frank van Harmelen" <Frank.van.Harmelen@cs.vu.nl>, sean.bechhofer@manchester.ac.uk
Message-Id: <BA831A8E-B99F-4DC2-ACE7-7CE3BEFB6A06@mbari.org>
We are trying to release a community semantic service (later this  
month!) that "does the right thing" in this arena. So I strongly agree  
with the tenor of this message. Except I am trying to imagine what  
implementation should happen in the _present_ for our service to be an  
exemplar.  I am sorry for the long post, but if it is mostly valid,  
hopefully it can advance the discussion.

We have provisionally settled on the following principles for this  
service (which is intended to store domain vocabularies and terms,  
keep track of their versions, and let people make relations between  
them).  I realize the focus of the original post was on URIs of the  
relations, but I think semantics of any terms are also important to  
consider, and probably apply to the relations.

Principles
   A. *Any* change to a vocabulary, including to any of its terms (and  
their semantics), metadata, means the vocabulary must get a new  
version = new URI
   B. A vocabulary contains all the terms within it, not just the  
terms that changed in that version
   C. The nominally opaque URIs must be fairly self-consistent in  
their presentation, or people in the non-semantic community will  
misunderstand them (or rebel against using them)
   D. It must be possible to 'look up' the current meaning of a term,  
as well as specifically request any past meanings by their URI
   E. It must be possible to choose (i.e., map to, or identify) either  
a specific (versioned) meaning, or a 'most current' meaning, for a  
given concept

 From these principles I've concluded
   aa. A new vocabulary version results in new term versions (= new  
URIs) for all the terms as well (even if their semantics haven't  
changed, sorry -- see below for further thoughts on this)
   bb. Any significant definitional or semantic change to a term  
should really create a new term, not just evolve the word we were  
already using (what was SKOS thinking?)
   cc. Created relationships to 'most current' URIs persist even as  
the semantics of that resource may change; this potentially introduces  
a time quality to inferences done with these resources (e.g., "Today's  
New York Times has an article on election polls" may be true statement  
today, but false next week.)  Those who choose to use the 'most  
current' term will get what they pay for.
   dd. Any created relationship that uses a 'most current' URI, should  
be timestamped to allow review of the historical state of the members  
of the triple (but note that this is strictly for understanding, since  
the selection of the 'most current' URI as the referenced concept  
explicitly permits changes to happen in that resource)
   ee. Both the provided service, and ontology engines in general,  
must be able to relate terms to their semantically identical  
historical counterparts
   ff. The service should be able to quickly identify/present to its  
users each change in semantic meaning for a term.

So two easy conclusions:

Yes, it is terrible for the semantics of a (nominally static) concept  
to change, and that concept's URI to remain the same. That breaks  
everything, as near as I can tell.

In the case of a subject/object term, it is clearly acceptable for the  
semantics of a _dynamic_ concept to change without changing the URI.

I am less sanguine about this for predicates -- it seems like you're  
allowing replacing the engine while the car is running. I can imagine  
a future scenario where this is advantageous for predicates, but it  
seems really inappropriate at this stage.

As to the multiple URIs for a single concept problem that was  
introduced in (aa) above, I have both a justification and a backup  
plan.  The justification is that the meaning of terms and their  
definitions is inferred in a context, and changes to the context (the  
rest of the vocabulary) can affect the implicit meaning, or usage, of  
a term that nominally wasn't changed. So even if I haven't changed the  
explicit definition of a term in a new vocabulary release, it is  
meaningful to consider this term a new resource, and give it a new  
URI, to reflect its new context.

Of course, it is also very important to say this new resource has the  
same definition and semantics as another, previous resource,  
preferably pointing back to the original instance with that definition/ 
semantics. The service described in (ee) needs this capability. But I  
think sameAs doesn't apply here, as the two URIs actually reflect two  
different resources, which are definitionally and semantically equal,  
but live in a different context.) I imagine we will have to create a  
relationship for our own use that has this meaning for now.

If you just can't stand all those URIs that have the same semantics,  
and you told me I had to use the original URI that had that meaning, I  
would say 'ok' -- then, to meet principal (C), I would create URIs  
that dereference to the original URI, so that when people get confused  
and use the (wrong, non-existent) URI that corresponds to that term in  
the current version of the vocabulary, at least I could respond with  
useful information.

(Yes, I know this emphasizes why URIs should be opaque, and I'm afraid  
in this respect I am consciously doing the 'wrong' thing by making my  
URI algorithm all too obvious. The value added by a semantic URI is  
just too compelling, for the success of the project and semantic  
adoption in general.)

John



On Oct 30, 2008, at 2:14 AM, Michael F Uschold wrote:

> I'm resending this message to the semantic web discussion group for  
> the record.
>
> On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold  
> <uschold@gmail.com> wrote:
> Currently there is no accepted practice on how/whether to migrate to  
> new URIs when a new version of an ontology is published. This is  
> largely due to the fact that there is no good technology for  
> managing versioning, and the W3C consciously (and probably sensibly)  
> decided not to address the issue.  Versioning information is meant  
> to be placed on a version annotation.
>
> However the current situation is like the wild West, and everyone  
> will be doing different things, resulting in a mess.
>
> Wordnet published a new version and minted all new URIs even though  
> many or most of the entries were semantically identical.
> The SKOS working group is currently considering the pros and cons of  
> various options. One is to adopt all new URIs in a new namespace,  
> just like Wordnet. Another is to keep the exact same name space, and  
> change the semantics of a small number of terms while keeping the  
> same URI. A third is to keep the same URI for the unchanged terms,  
> and mint new URIs for the terms with different semantics.
>
> This is a problem because they have no guidelines, they are  
> basically stumbling along in the dark.
>
> I believe that this is an urgent matter that needs attention to  
> prevent a nightmare from unfolding.
>
> In the current state of semantic web use, it may not matter to much  
> what choice the SKOS team chooses. This is mainly relatively few  
> applications will be impacted, which may be due to the fact that the  
> applications are not driven by the ontologies.
>
> However, when usage of ontologies and ontology-driven applications  
> becomes more mainstream, the differences could be profound. Given  
> that this issue is intimately tied up with versioning, and that we  
> have no good solutions yet, do we continue to throw our hands up and  
> punt? Absolutely not, it is essential that a good precedent is set  
> ASAP that is based on sound principles.
>
> Here is how.
>
> We should imagine a future where ontology versioning is handled  
> properly and do things that are going to make things easy to migrate  
> to that future. We don't know how the versioning black box will  
> work, but we should be able to make some clear and definitive  
> statements about WHAT it does.
>
> For example, in the future, ontology-driven applications will be  
> fairly mainstream. URIs are used as unique identifiers. When  
> applications are driven from ontologies, then they will break if you  
> change the semantics in mid-stream.  Imagine an application that  
> relied on the semantics of broader as it was originally specified  
> with transitivity.  They loaded data that was created using that  
> semantics. Then the SKOS spec changes and broader is no longer  
> transitive. New datasets are created according to this new meaning.  
> The application loads more data. It needs to know which data is  
> subject to transitive closure and which is not. This is impossible,  
> if the same SKOS URI is used for versions with different semantics.   
> They are different beasts, and thus MUST have different URIs.
>
> Similarly, if SKOS mints a whole new namespace and changes all the  
> URIs, the application also has a problem. It has datasets with the  
> old URI and datasets with the new URIs. This means that the datasets  
> will not be linked like they should, they will treat the two  
> different URIs for the same thing as being different.  If one wanted  
> to go into OWL-Full, one can use owl:sameAs, but this is not very  
> practical.  The only reasonable solution is to have the same URI for  
> things with the same semantics.
>
> Thus, any ontology versioning systemof the future will rely on these  
> two principles:
> 1. If the semantics of a term changes, then it needs to have a new  
> unique ID.
> 2. If the semantics of a term does NOT change, then it should  
> maintain the same ID in any future versions.
>
> If either of these two guidelines are broken, then so will the  
> ontology-driven applications of the future.
>
> These maxims hold without exception for any standards that are  
> formally released as standards.
> A question arises if we need to hold to the same standards for  
> standards like SKOS which was never formally blessed.
>
> The practical difficulties will be the same whether the standard is  
> blessed or not. It only really depends on whether the standard is a  
> de facto standard,or whether it is getting significant use. If users  
> build things and ontology producers break things through  
> carelessness, this will hinder semantic web technology adoption.
>
> Another question is what to do if the original standard is belived  
> to be incorrect, and the new one is the fixed one. Can one then keep  
> the same URI?
> Again, the answer should be informed by the impact on applications.  
> The same problems will occur if you change the semantics and keep  
> the same URI even if you are fixing a mistake.  The URI with the  
> wrong semantics must keep its original unique ID.
>
> Michael Uschold
>


John

--------------
John Graybeal   <mailto:graybeal@mbari.org>  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org
Received on Thursday, 30 October 2008 19:09:57 UTC