Re: URIs and Unique IDs from Michael F Uschold on 2008-11-01 (semantic-web@w3.org from November 2008)

From: Michael F Uschold <uschold@gmail.com>
Date: Sat, 1 Nov 2008 17:33:41 +0100
To: "John Graybeal" <graybeal@mbari.org>
Cc: semantic-web@w3.org, aldo.gangemi@gmail.com, "Conor Shankey" <cshankey@reinvent.com>, "Peter Mika" <pmika@yahoo-inc.com>, "Ora Lassila" <ora.lassila@nokia.com>, "Pan, Dr Jeff Z." <jeff.z.pan@abdn.ac.uk>, "Tim Berners-Lee" <timbl@csail.mit.edu>, "Frank van Harmelen" <Frank.van.Harmelen@cs.vu.nl>, sean.bechhofer@manchester.ac.uk
Message-ID: <406b38b50811010933o5ce40ceencfac16457e9f7d25@mail.gmail.com>
comments in line

On Thu, Oct 30, 2008 at 8:08 PM, John Graybeal <graybeal@mbari.org> wrote:

> We are trying to release a community semantic service (later this month!)
> that "does the right thing" in this arena.
>

Excellent, glad to learn this.


> So I strongly agree with the tenor of this message. Except I am trying to
> imagine what implementation should happen in the _present_ for our service
> to be an exemplar.  I am sorry for the long post, but if it is mostly valid,
> hopefully it can advance the discussion.
> We have provisionally settled on the following principles for this service
> (which is intended to store domain vocabularies and terms, keep track of
> their versions, and let people make relations between them).  I realize the
> focus of the original post was on URIs of the relations, but I think
> semantics of any terms are also important to consider, and probably apply to
> the relations.
>
> Principles
>   A. *Any* change to a vocabulary, including to any of its terms (and their
> semantics), metadata, means the vocabulary must get a new version = new URI
>

Agreed.  I assume you mean the vocabulary is the ontology? Are we assuming
OWL ontologies here, if not then what do you mean by a vocabulary?


>   B. A vocabulary contains all the terms within it, not just the terms that
> changed in that version
>

So in the SKOS example, the new SKOS vocabulary/ontology would contain the
terms that do not change URIs as well as terms with new versions with new
URIs.


>   C. The nominally opaque URIs must be fairly self-consistent in their
> presentation, or people in the non-semantic community will misunderstand
> them (or rebel against using them)
>

This issue arises because of the conflation of URIs, UIDs and human-readable
IDs. Until these are de-conflated, probably this principle is the right one.
It will be unnecessary after de-conflation.


>   D. It must be possible to 'look up' the current meaning of a term, as
> well as specifically request any past meanings by their URI
>

I read this that a term like 'broader' in SKOS could have multiple URIs for
multiple versions.  If this is what you mean, then I absolutely agree with
this.  If this is not what you mean, then what is the difference between the
'current meaning of a term' and any other meaning of that term.


>   E. It must be possible to choose (i.e., map to, or identify) either a
> specific (versioned) meaning, or a 'most current' meaning, for a given
> concept
>

Agreed. You seem to be proposing the idea of some kind of object (perhaps
with a URI)  that corresponds to the core term, and that its various
meanings are related versions are linked to the core term. This may be a
workable idea. Can this be done with the current  semantic web
infrastructure?


>
> From these principles I've concluded
>   aa. A new vocabulary version results in new term versions (= new URIs)
> for all the terms as well (even if their semantics haven't changed, sorry --
> see below for further thoughts on this)
>

I definitely disagree on this, even after reading your material below.


>   bb. Any significant definitional or semantic change to a term should
> really create a new term, not just evolve the word we were already using
> (what was SKOS thinking?)
>

This is an interesting question with more than one reasonable position. I
think there are at least two cases:
1. there was a bonified conceptual error, and everyone agrees that the old
meaning was the wrong one and it is not wanted.
2. there is a new alternative, that works in some cases, and some may also
wish to use the older versions.

For 1. you do NOT want to change the name f the term, was and is the right
term.  But you DO want to change its UID because it is a different thing.
For 2, you probably want to introduce a new term with a new name and a new
UID. You could have the name of the transitive version of broader be called
broaderT and the non-transitive one be called broader. You should be able to
change the name w/o changing the UID.


>   cc. Created relationships to 'most current' URIs persist even as the
> semantics of that resource may change; this potentially introduces a time
> quality to inferences done with these resources (e.g., "Today's New York
> Times has an article on election polls" may be true statement today, but
> false next week.)  Those who choose to use the 'most current' term will get
> what they pay for.
>

You might be able to have programmatic or infrastructural capability which
can return the 'most current' version of a given core term. There might be a
URI/UID for the core term, and that is what would be accessed. There, a
directive would be given that says please return the the most recent version
of that item. This is a promising idea that could probably keep everyone
happy.



>   dd. Any created relationship that uses a 'most current' URI, should be
> timestamped to allow review of the historical state of the members of the
> triple (but note that this is strictly for understanding, since the
> selection of the 'most current' URI as the referenced concept explicitly
> permits changes to happen in that resource)
>

Timestamping is useful, but could be expensive.


>   ee. Both the provided service, and ontology engines in general, must be
> able to relate terms to their semantically identical historical
> counterparts
>

When every version of every term has its own UID, then this becomes
feasible, though it may also be an expensive overhead.


>   ff. The service should be able to quickly identify/present to its users
> each change in semantic meaning for a term.
>


Yes, and an application should also be able to subscribe to the core UID for
a concept to be notified of any changes so it can keep up to date
automatically in the case where the most uptodate version is wanted, and
otherwise people can look into new versions on a case by case basis.


>
> So two easy conclusions:
>
> Yes, it is terrible for the semantics of a (nominally static) concept to
> change, and that concept's URI to remain the same. That breaks everything,
> as near as I can tell.
>

Agreed.


>
> In the case of a subject/object term, it is clearly acceptable for the
> semantics of a _dynamic_ concept to change without changing the URI.
>

Well the core URI/UID can stay the same, but each version needs to have its
own UID so applications that want to use old versions don't break.

There may be some clear cut cases where you can tell which things are static
vs. dynamic. However IMHO, it is likely that a lot (perhaps most) case will
be dependent on the needs of the application, and the same concept may be
dynamic in some applications and static in others.



>
> I am less sanguine about this for predicates -- it seems like you're
> allowing replacing the engine while the car is running.
>

I don't follow this analogy.


> I can imagine a future scenario where this is advantageous for predicates,
> but it seems really inappropriate at this stage.
>

You have a strong intuition that I'm not able to grasp.  Can you articulate
why with an example?


>
> As to the multiple URIs for a single concept problem that was introduced in
> (aa) above, I have both a justification and a backup plan.  The
> justification is that the meaning of terms and their definitions is inferred
> in a context, and changes to the context (the rest of the vocabulary) can
> affect the implicit meaning, or usage, of a term that nominally wasn't
> changed.
>

This is true, and the reason why terms/words in wordnet belong to multiple
synsets. Each synset has a unique meaning, and in the owl dataset, each
synset has its own URI. So I don't find your argument convincing.  Multiple
context shows different uses of a term, so each use should get a different
UID, not the same one.


> So even if I haven't changed the explicit definition of a term in a new
> vocabulary release, it is meaningful to consider this term a new resource,
> and give it a new URI, to reflect its new context.
>

Maybe the wordnet example is a read herring.  In any event, can you provided
a clear example of how an application would find it helpful to have whole
new sets of URIs minted for identical things?

Here is one example where it is clearly a bad thing.
The application is ontology-driven at a deep level. It makes use of the
resources in the coding/creation of application functionality. It also loads
and makes use of data using the ontology.
T1: application loads ontology using original terms.
T2: application loads data expressed using the original terms
T3: all new URIs are minted, when only a few have changed semantics, and
there is no indication of which ones have new semantics and which have the
same semantics.
T4: A new dataset is created which uses the new URIs
T5: The application loads the new data
T6: The application poses a query which uses the old URIs to filter data.
T7; The new URIs do not match the old ones, so the query only returns data
from the old URIs when it should return data from the new dataset as well.

This is clearly a bad thing.   Your proposal has to argue advantages that
offset the disadvantage here, in order for me to buy into it.



> Of course, it is also very important to say this new resource has the same
> definition and semantics as another, previous resource, preferably pointing
> back to the original instance with that definition/semantics.
>

This creates an unnecessary burden and seems to contradict your point that
something in a different context will have different semantics. If it has
different semantics, then why point back to something with identical
semantics?


> The service described in (ee) needs this capability. But I think sameAs
> doesn't apply here, as the two URIs actually reflect two different
> resources, which are definitionally and semantically equal, but live in a
> different context.)
>

I still can't see any advantages for creating multiple copies of exactly the
same thing.
Have I missed something?


> I imagine we will have to create a relationship for our own use that has
> this meaning for now.
>

We probably will need some new infrastructural primitives, to relate
versions to each other.


>
> If you just can't stand all those URIs that have the same semantics, and
> you told me I had to use the original URI that had that meaning, I would say
> 'ok' -- then, to meet principal (C), I would create URIs that dereference to
> the original URI, so that when people get confused and use the (wrong,
> non-existent) URI that corresponds to that term in the current version of
> the vocabulary, at least I could respond with useful information.
>

This is a practical solution which would probably be pretty easy when URIs
are de-conflated with UIDs. Though proliferation of URIs for the same thing
should be reduced whenever possible.

See another thread I started on similar topic by googling
["proliferation of URIs" uschold]


>
> (Yes, I know this emphasizes why URIs should be opaque, and I'm afraid in
> this respect I am consciously doing the 'wrong' thing by making my URI
> algorithm all too obvious. The value added by a semantic URI is just too
> compelling, for the success of the project and semantic adoption in
> general.)
>
> John
>
>
>
> On Oct 30, 2008, at 2:14 AM, Michael F Uschold wrote:
>
> I'm resending this message to the semantic web discussion group for the
> record.
>
> On Wed, Oct 29, 2008 at 3:53 PM, Michael F Uschold <uschold@gmail.com>
> wrote:
>
>> Currently there is no accepted practice on how/whether to migrate to new
>> URIs when a new version of an ontology is published. This is largely due to
>> the fact that there is no good technology for managing versioning, and the
>> W3C consciously (and probably sensibly) decided not to address the issue.
>> Versioning information is meant to be placed on a version annotation.
>>
>> However the current situation is like the wild West, and everyone will be
>> doing different things, resulting in a mess.
>>
>> Wordnet published a new version and minted all new URIs even though many
>> or most of the entries were semantically identical.
>> The SKOS working group is currently considering the pros and cons of
>> various options. One is to adopt all new URIs in a new namespace, just like
>> Wordnet. Another is to keep the exact same name space, and change the
>> semantics of a small number of terms while keeping the same URI. A third is
>> to keep the same URI for the unchanged terms, and mint new URIs for the
>> terms with different semantics.
>>
>> This is a problem because they have no guidelines, they are basically
>> stumbling along in the dark.
>>
>> I believe that this is an urgent matter that needs attention to prevent a
>> nightmare from unfolding.
>>
>> In the current state of semantic web use, it may not matter to much what
>> choice the SKOS team chooses. This is mainly relatively few applications
>> will be impacted, which may be due to the fact that the applications are not
>> driven by the ontologies.
>>
>> However, when usage of ontologies and ontology-driven applications becomes
>> more mainstream, the differences could be profound. Given that this issue is
>> intimately tied up with versioning, and that we have no good solutions yet,
>> do we continue to throw our hands up and punt? Absolutely not, it is
>> essential that a good precedent is set ASAP that is based on sound
>> principles.
>>
>> Here is how.
>>
>> We should imagine a future where ontology versioning is handled properly
>> and do things that are going to make things easy to migrate to that future.
>> We don't know how the versioning black box will work, but we should be able
>> to make some clear and definitive statements about WHAT it does.
>>
>> For example, in the future, ontology-driven applications will be fairly
>> mainstream. URIs are used as unique identifiers. When applications are
>> driven from ontologies, then they will break if you change the semantics in
>> mid-stream.  Imagine an application that relied on the semantics of broader
>> as it was originally specified with transitivity.  They loaded data that was
>> created using that semantics. Then the SKOS spec changes and broader is no
>> longer transitive. New datasets are created according to this new meaning.
>> The application loads more data. It needs to know which data is subject to
>> transitive closure and which is not. This is impossible, if the same SKOS
>> URI is used for versions with different semantics.  They are different
>> beasts, and thus MUST have different URIs.
>>
>> Similarly, if SKOS mints a whole new namespace and changes all the URIs,
>> the application also has a problem. It has datasets with the old URI and
>> datasets with the new URIs. This means that the datasets will not be linked
>> like they should, they will treat the two different URIs for the same thing
>> as being different.  If one wanted to go into OWL-Full, one can use
>> owl:sameAs, but this is not very practical.  The only reasonable solution is
>> to have the same URI for things with the same semantics.
>>
>> Thus, any ontology versioning systemof the future will rely on these two
>> principles:
>> 1. If the semantics of a term changes, then it needs to have a new unique
>> ID.
>> 2. If the semantics of a term does NOT change, then it should maintain the
>> same ID in any future versions.
>>
>> If either of these two guidelines are broken, then so will the
>> ontology-driven applications of the future.
>>
>> These maxims hold without exception for any standards that are formally
>> released as standards.
>> A question arises if we need to hold to the same standards for standards
>> like SKOS which was never formally blessed.
>>
>> The practical difficulties will be the same whether the standard is
>> blessed or not. It only really depends on whether the standard is a de facto
>> standard,or whether it is getting significant use. If users build things and
>> ontology producers break things through carelessness, this will hinder
>> semantic web technology adoption.
>>
>> Another question is what to do if the original standard is belived to be
>> incorrect, and the new one is the fixed one. Can one then keep the same URI?
>> Again, the answer should be informed by the impact on applications. The
>> same problems will occur if you change the semantics and keep the same URI
>> even if you are fixing a mistake.  The URI with the wrong semantics must
>> keep its original unique ID.
>>
>> Michael Uschold
>>
>
>
>
> John
>
> --------------
> John Graybeal   <mailto:graybeal@mbari.org <graybeal@mbari.org>>  --
> 831-775-1956
> Monterey Bay Aquarium Research Institute
> Marine Metadata Interoperability Project: http://marinemetadata.org
>
>
Received on Saturday, 1 November 2008 16:34:20 UTC