Re: URIs and Unique IDs from Michael F Uschold on 2008-11-07 (semantic-web@w3.org from November 2008)

From: Michael F Uschold <uschold@gmail.com>
Date: Fri, 7 Nov 2008 01:28:01 +0100
To: "Michael Lang" <michaelalang@gmail.com>
Cc: "Peter Ansell" <ansell.peter@gmail.com>, cshankey@reinvent.com, "John Graybeal" <graybeal@mbari.org>, semantic-web@w3.org, "aldo gangemi" <aldo.gangemi@gmail.com>, "Peter Mika" <pmika@yahoo-inc.com>, "Ora Lassila" <ora.lassila@nokia.com>, "Dr Jeff Z. Pan" <jeff.z.pan@abdn.ac.uk>, "Tim Berners-Lee" <timbl@csail.mit.edu>, "Frank van Harmelen" <Frank.van.Harmelen@cs.vu.nl>, "sean bechhofer" <sean.bechhofer@manchester.ac.uk>, "Michael Lang(Jr.)" <michaelallenlang@gmail.com>
Message-ID: <406b38b50811061628h543e23eaq9589e667432498ad@mail.gmail.com>
MIchael Lang says:
This has developed into an interesting exchange. It began with the problem
that URI's had changed in the new version of Wordnet and has evolved into a
discussion of versioning and change management.

Actually, the two issues are highly inter-twined.  The issue arose with
versioning of Wordnet and SKOS.  The original post was about how to deal
with URIs when versions change.

What this discussion has made clear to me is the existence of two very
different kind of use cases and cultures. Those that are Web 2.0 ish, open
and dynamic, which may never be used for mission-critical systems and more
closed systems that may use the exact same ontologies and datasets, but that
need to be protected from careless version mis-management.

It will be great if we can tease out whether the two worlds can co-exist and
use one set of versioning guidelines for both, or whether there are
fundamental incompatibilities in requirements that will mean no uniform set
of guidelines.

Michael

On Thu, Nov 6, 2008 at 11:38 PM, Michael Lang <michaelalang@gmail.com>wrote:

> This has developed into an interesting exchange. It began with the problem
> that URI's had changed in the new version of Wordnet and has evolved into a
> discussion of versioning and change management.
>
> If an ontology is being used as a component of an application or to "model
> drive" an application, then obviously versioning and change management are
> big deals, and the community developing the ontology should have a
> governance model that deals with the issue.
>
> If an ontology is being used by a community to build a description of a
> domain and accumulate facts about the domain, then the concept of versions
> is obsolete. The ontology by definition changes over time and becomes more
> valuable the more it changes. If, as in the case with Wordnet that started
> this thread, some person uses the community built ontology to "model drive"
> an application, problems are bound to develop.
>
> An ontology built for one purpose cannot be used and should not be used for
> some other purpose unless one accepts the governance of the community
> building the ontology. It is critical for all communities publishing
> ontologies to be explicit about their governance so people understand the
> ground rules for using the ontologies. If their is no explicit governance
> model elucidated by the community, use it at your own risk.
>
> Ontologies are blessed with an embarrassing multitude of real use cases. It
> is not reasonable to expect some "given" ground rules for any particular
> ontology. A governance model will state the ground rules and intentions of
> the community so that we will know how it can be used within external
> applications.
>
> michael lang
> revelytix
>
>
> On Tue, Nov 4, 2008 at 4:11 PM, Peter Ansell <ansell.peter@gmail.com>wrote:
>
>> Hi Conor,
>>
>> I don't disagree that versioning will be important in specific scenarios,
>> but I disagree that web-based distributed reasoning will ever be used to
>> drive mission critical systems, simply because of the authority distinction.
>> People rely on the web for arbitrary services like email and social
>> networking, but they don't rely on external websites as data sources for
>> their corporate software, or if they do their shareholders should be
>> entitled to know that their risk model is very diverse. Skipping from
>> data-only sources to programmatic direction services requires another set of
>> risk profiles about what could go wrong that noone in the company could
>> respond to.
>>
>> Of the scenarios that you describe I am only familiar with the bio issue.
>> I should hope that people never setup systems that rely on specific protein
>> related assertions by scientists to be the absolute reality. One place that
>> reality departs from scientific knowledge is in the individual patient, and
>> at minimum any medical systems should be trained on ranges of data and be
>> offered as suggestions at best. I agree that the trained models however
>> should be versioned, but that is easy as they are under the control of the
>> organisation who is performing the diagnosis (ie, providing the application
>> and data combination) and they can implement the change without extensive
>> general community consultation.
>>
>> If someone decides to tear up the model then they should do it on test
>> servers, as with any other non-production system in the history of
>> programming :) . They can do it on the test server because everything they
>> directly rely on is under their control, as they process the external data
>> and verify it in a testing phase before utilising it. The brittleness of the
>> application rules won't be as big a factor if the resources are under the
>> groups control and are merely based on external assertions for source data.
>> Because we are focusing on getting things stable at the application level it
>> is not a hassle to contain the versions, as there will only ever be two, (or
>> three) around in production use that are the most current. Any others will
>> be supported in limited ways like any other outdated computer system. If it
>> gets to the stage where the data and ontologies are distributed as a package
>> then organisations could hypothetically be able to utilise the system with
>> its inherent bugs even if the company (foolishly) decided to migrate future
>> product offerings to a new incompatible version using the current URI|UID
>> combinations, but this doesn't require a large degree of versioning
>> expertise as it is a switchover point where you need expert human advice for
>> the data migration stage anyway and the outdated version will not be
>> utilised after the switchover.
>>
>> The main bit I was disagreeing with in my email was with the emphasis on
>> the distributed web having to be versioned to support particular
>> applications when it is clear that the applications will always be brittle,
>> and hence they should rely on data that has been verified and is
>> contextually static, at least in representation format, and under the
>> control of the organisation running the application. Web URI's are great for
>> resolution and distributing the burden, but if you are financially reliant
>> on the resource then you should be allowed to have it locally cached for
>> stability so that you can manage the versioning internally. This applies to
>> both data and ontologies as they both have places where they can break a
>> semantic rule directed program. Keeping completely up to date with external
>> affairs doesn't seem useful given the potential for breakages at so many
>> levels. (To name one breakage point I would refer to a certain permanent
>> redirection service being overused and effectively able to completely break
>> programs which rely on live data when it goes down. And that has nothing to
>> do with versioning, but it still highlights the falacy of live data programs
>> ever being reliable)
>>
>> Cheers,
>>
>> Peter
>>
>> ----- "Conor Shankey" <cshankey@reinvent.com> wrote:
>>
>> > From: "Conor Shankey" <cshankey@reinvent.com>
>> > To: "Michael Lang(Jr.)" <michaelallenlang@gmail.com>
>> > Cc: "Peter Ansell" <ansell.peter@gmail.com>, "John Graybeal" <
>> graybeal@mbari.org>, "Michael F Uschold"
>> > <uschold@gmail.com>, semantic-web@w3.org, "aldo gangemi" <
>> aldo.gangemi@gmail.com>, "Peter Mika" <pmika@yahoo-inc.com>,
>> > "Ora Lassila" <ora.lassila@nokia.com>, "Dr Jeff Z. Pan" <
>> jeff.z.pan@abdn.ac.uk>, "Tim Berners-Lee"
>> > <timbl@csail.mit.edu>, "Frank van Harmelen" <
>> Frank.van.Harmelen@cs.vu.nl>, "sean bechhofer"
>> > <sean.bechhofer@manchester.ac.uk>, michaelalang@gmail.com
>> > Sent: Wednesday, 5 November, 2008 4:20:02 AM GMT +10:00 Brisbane
>> > Subject: Re: URIs and Unique IDs
>> >
>> > I strongly disagree that versioning will not be important. I suspect
>> > that it will become the most profound and challenging problem to
>> > tackle if we are to scale the application of semantic technology.
>> > Change management is a less critical in the short term for those
>> > concerned with the linguistic notion of semantics. However, if you are
>> > concerned with leveraging semantic models to drive/support high value
>> > proposition mission critical systems, change management becomes a
>> > serious concern. Versioning and change management becomes a show
>> > stopper if you are going even further and intend to create full
>> > computational semantic systems where the algorithms and data/object
>> > models of software systems are replaced by semantic models. In each
>> > one of these three areas the level of trust and dependencies on the
>> > asserted semantics will become critical.
>> >
>> > Here are a few examples:
>> >
>> > 1. Trust semantic models or ontologies to support operational/mission
>> > systems such as:
>> > a. Equipment, system maintenance applications
>> > - an knowledge modeler/ontologists asserted that a General Electric
>> > A877623 is a subclass of a Turbo Prop Engine and then in a later
>> > version realizes their mistake that it is a subclass of another
>> > system. The difference affects the scheduling of maintenance for
>> > aircraft.
>> > - a similar model asserts that a system should be overhauled if a
>> > certain condition occurs
>> > b. Operational policies and compliance applications
>> > - a knowledge modelers asserts that a person who approve a credit
>> > rating cannot approve a loan but in a later version of the compliance
>> > ontology realizes that the semantics need to be far more
>> > sophisticated. The difference affects the ability of the compliance
>> > system to prevent or permit fraud.
>> > c. Medical / Bio applications
>> > - A bio medical ontologists asserts that one protein up-regulates a
>> > gene. Another subject matter expert asserts that the same protein down
>> > regulates a gene. Another researchers realizes that it is important to
>> > tear down the model and express the context of the scenario to capture
>> > the conflict. The difference affects the ability of a medical
>> > diagnostic system.
>> > d. Intelligence systems
>> > - The model of a social / economic network for terrorist in one model
>> > needs to be advanced to not to create millions of false positives.
>> > e. Any other system that dreams of integrating vast amounts of subject
>> > matter expertise and organizing into something more sophisticated and
>> > operational than just a categorization system, dictionary or primitive
>> > taxonomy.
>> >
>> > 2. Simple, but ontologies/semantic models with massive adoption
>> > a. In one popular social networking ontology the class Person is used
>> > by millions of people. Later it becomes critical to redefine the class
>> > as a subclass of Social Contact in order to differentiate from the
>> > animal or physical notion of Person in another widely used ontology.
>> >
>> > 3. In the longer term vision, semantic technology Drive model driven /
>> > ontology driven software systems
>> > a. Declarative, rich semantic models that explicitly describe the
>> > behavour of parts or every aspect of a functional software system.
>> > b. Models that explicitly express the compatibility semantics between
>> > one software system and another so that software systems actually
>> > understand their purpose and functionality.
>> >
>> > Systems that are more concerned with the NLP or the linguistic notion
>> > of "semantics" are currently a little bit more resilient to change
>> > management because their application tend to use statistics or
>> > approximation to create value. Example applications would be sense
>> > disambiguation for advertising, entity extraction, etc.. For these
>> > systems machine learning can help us cope with a lot of
>> > inconsistencies in semantic models. However, as these systems will
>> > become more mission critical and the rationalization and harmonization
>> > of semantics between various ontologies will start to become a serious
>> > economic issue. Using the right version of various semantic models
>> > (such as Wordnet, DBPedia, etc..) will become a very challenging and
>> > painful problem. This latter area is a significant concern and area of
>> > effort/management right now.
>> >
>> > The power of semantics can permit us to formally express and share the
>> > semantics of things explicitly or implicitly. This can ultimately help
>> > to actually get a grip on the ugly world of change management.
>> > However, in the short term it will open a Pandora's box of power and
>> > change management problems.
>> >
>> > Conor
>> >
>> > Conor Shankey
>> > CTO
>> > Reinvent, Inc - Vancouver.com
>> > www.Reinvent.com
>> > www.Vancouver.com
>> >
>> > Michael Lang(Jr.) wrote:
>> >
>> > Peter,
>> >
>> >
>> > I agree 100% with your assessment. In the semantic web world, I
>> > believe that versioning will not be very important. I think a major
>> > benefit of using semantic web technologies is that you can build an
>> > application that will adapt to changes in the semantics of a word as
>> > the semantics change in the real world.
>> >
>> >
>> > But, as you said, there may be cases where, at a significant point in
>> > time, a community would like to version its vocabulary. The goal of
>> > this discussion is simply to develop some guidelines for versioning,
>> > when it is necessary, that will make the transition from a past
>> > version of a vocabulary to an new one as easy, accurate, and flexible
>> > as possible for the users of a vocabulary.
>> >
>> >
>> > Mike Lang
>> >
>> >
>> > On Tue, Nov 4, 2008 at 1:41 AM, Peter Ansell < ansell.peter@gmail.com
>> > > wrote:
>> >
>> >
>> >
>> >
>> > ----- "John Graybeal" < graybeal@mbari.org > wrote:
>> >
>> > > On Nov 3, 2008, at 10:48 AM, Michael Lang(Jr.) wrote:
>> > >
>> > > > I strongly believe (and it seems that you and John agree) that if
>> > a
>> > >
>> > > > UID for a concept changes, the old version must have some way of
>> > > > pointing to the new version.
>> > >
>> > > Funny, I would have said this the other way around (new points back
>> > to
>> > > old, then the system services can provide the old -> new capability
>> > --
>> > > or is this what you are saying too?). I have this notion that *any*
>> > > change to a static resource's specifications -- definition,
>> > metadata,
>> > > semantics -- makes a new resource (this lets me compare resource_new
>> > > to resource_old and see the difference between them unambiguously).
>> > >
>> > > With this vision, the resource can't change once it is created, even
>> > > to point to a new resource (you see the problem). Is this vision
>> > just
>> > > plain wrong, per the consensus?
>> >
>> > Should we really focus on a "ya just never know, do ya" philosophy
>> > that hurts the majority of casual users more than it helps the
>> > specialised users? If you make up a system where you require that
>> > people manually migrate all their past statements in order to use the
>> > system in a months time then you won't be looked upon too favourably.
>> > And if you give them the choice to mass migrate their statements then
>> > what is the point if they always select "migrate all to most current
>> > versions"?
>> >
>> > This is a very radical discussion that I don't think fits the majority
>> > of use cases that the semantic web will be applied to, as it is
>> > decidedly anti Web-2.0 where there is a constant evolution and links
>> > are relative, not static as in Web-1.0. If you really face it, meaning
>> > migrates, and the particular structure at a given instant in time
>> > isn't as relevant as the improvement in meaning anyway. If rules in
>> > the semantic web are completely reliant on data structures and unable
>> > to recognise the overall meaning that people gradually migrate towards
>> > then they are always going to be brittle, whether people are perfectly
>> > pedantic about UID's and/or URI's or whether they end up referencing
>> > everything with relative addresses which don't focus on particular
>> > representations at particular points in time.
>> >
>> > It isn't bad to version information at significant points in time, but
>> > the archaic once-published-always-published-never-modified culture
>> > doesn't fit with electronic technologies IMO.
>> >
>> > (Just a few thoughts :) )
>> >
>> > Cheers,
>> >
>> > Peter
>> >
>> >
>> >
>> > --
>> > Revelytix, Inc.
>> >
>> > phone: 410-584-0009 (office)
>> > 443-928-3782 (cell)
>> > skype: michael.allen.lang.jr
>> > aim: MikeJrRevelytix
>>
>
>
Received on Friday, 7 November 2008 00:53:05 UTC