Re: URIs and Unique IDs

On Thu, Oct 30, 2008 at 4:14 AM, Michael F Uschold <uschold@gmail.com> wrote:
>> Currently there is no accepted practice on how/whether to migrate to new
>> URIs when a new version of an ontology is published.

I should point out that within the Open Biomedical Ontologies there is
an explicit policy of *not* changing URIs as new versions of the
ontology are released - for one thing that would be impractical - some
of them are updated daily. Rather there is a policy on deprecation -
terms that are deprecated are marked as such and kept in the ontology
so as not to leave dangling pointers.

>> This is largely due to
>> the fact that there is no good technology for managing versioning, and the
>> W3C consciously (and probably sensibly) decided not to address the issue.
>> Versioning information is meant to be placed on a version annotation.
>>
>> However the current situation is like the wild West, and everyone will be
>> doing different things, resulting in a mess.
>>
>> Wordnet published a new version and minted all new URIs even though many
>> or most of the entries were semantically identical.
>> The SKOS working group is currently considering the pros and cons of
>> various options. One is to adopt all new URIs in a new namespace, just like
>> Wordnet. Another is to keep the exact same name space, and change the
>> semantics of a small number of terms while keeping the same URI. A third is
>> to keep the same URI for the unchanged terms, and mint new URIs for the
>> terms with different semantics.

The tricky part is defining "different semantics". On the very literal
side, *any* change changes the semantics, and certainly operations
that change the axioms associated with the term. However I think that
the focus should be on denotation. That is, when a terms is created
there is an intention that it denote some entity/entities. In my view,
as long as that denotation is intended to be stable, the term "means"
the same thing, regardless of other changes in the ontology. In
practice, it is often the case that changes these days do not change
the intended denotation. Rather, ontologies are improved, refactored,
elaborated in order that the formal semantics better reflect the
authors intention.

So I would spend some time thinking about what *you* mean by
"different semantics", as all the rest of your discussion depends on
this.

That doesn't mean that there shouldn't be some way to track the
history of a term. However, I would have that happen by annotation,
change notes, etc rather than creating a new URI to replace the old
one.

>>
>> This is a problem because they have no guidelines, they are basically
>> stumbling along in the dark.
>>
>> I believe that this is an urgent matter that needs attention to prevent a
>> nightmare from unfolding.
>>
>> In the current state of semantic web use, it may not matter to much what
>> choice the SKOS team chooses. This is mainly relatively few applications
>> will be impacted, which may be due to the fact that the applications are not
>> driven by the ontologies.
>>
>> However, when usage of ontologies and ontology-driven applications becomes
>> more mainstream, the differences could be profound. Given that this issue is
>> intimately tied up with versioning, and that we have no good solutions yet,
>> do we continue to throw our hands up and punt? Absolutely not, it is
>> essential that a good precedent is set ASAP that is based on sound
>> principles.
>>
>> Here is how.
>>
>> We should imagine a future where ontology versioning is handled properly
>> and do things that are going to make things easy to migrate to that future.
>> We don't know how the versioning black box will work, but we should be able
>> to make some clear and definitive statements about WHAT it does.
>>
>> For example, in the future, ontology-driven applications will be fairly
>> mainstream. URIs are used as unique identifiers. When applications are
>> driven from ontologies, then they will break if you change the semantics in
>> mid-stream.  Imagine an application that relied on the semantics of broader
>> as it was originally specified with transitivity.  They loaded data that was
>> created using that semantics. Then the SKOS spec changes and broader is no
>> longer transitive.

This is a good example of why using names in URIs is a bad idea. If
the URIs were opaque numeric ids you could have simply changed the
label on the old "broader" to "broader transitive" and moved on. As
SKOS didn't do this it created problems for itself.

The OBO ontologies are moving towards *all* URI being numeric id based
for this reason (until recently it had only been classes that were
named that way).

-Alan

>> New datasets are created according to this new meaning.
>> The application loads more data. It needs to know which data is subject to
>> transitive closure and which is not. This is impossible, if the same SKOS
>> URI is used for versions with different semantics.  They are different
>> beasts, and thus MUST have different URIs.
>>
>> Similarly, if SKOS mints a whole new namespace and changes all the URIs,
>> the application also has a problem. It has datasets with the old URI and
>> datasets with the new URIs. This means that the datasets will not be linked
>> like they should, they will treat the two different URIs for the same thing
>> as being different.  If one wanted to go into OWL-Full, one can use
>> owl:sameAs, but this is not very practical.  The only reasonable solution is
>> to have the same URI for things with the same semantics.
>>
>> Thus, any ontology versioning systemof the future will rely on these two
>> principles:
>> 1. If the semantics of a term changes, then it needs to have a new unique
>> ID.
>> 2. If the semantics of a term does NOT change, then it should maintain the
>> same ID in any future versions.
>>
>> If either of these two guidelines are broken, then so will the
>> ontology-driven applications of the future.
>>
>> These maxims hold without exception for any standards that are formally
>> released as standards.
>> A question arises if we need to hold to the same standards for standards
>> like SKOS which was never formally blessed.
>>
>> The practical difficulties will be the same whether the standard is
>> blessed or not. It only really depends on whether the standard is a de facto
>> standard,or whether it is getting significant use. If users build things and
>> ontology producers break things through carelessness, this will hinder
>> semantic web technology adoption.
>>
>> Another question is what to do if the original standard is belived to be
>> incorrect, and the new one is the fixed one. Can one then keep the same URI?
>> Again, the answer should be informed by the impact on applications. The
>> same problems will occur if you change the semantics and keep the same URI
>> even if you are fixing a mistake.  The URI with the wrong semantics must
>> keep its original unique ID.
>>
>> Michael Uschold
>
>

Received on Sunday, 9 November 2008 16:51:34 UTC