Re: URIs and Unique IDs from Peter Ansell on 2008-11-04 (semantic-web@w3.org from November 2008)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Wed, 5 Nov 2008 07:11:01 +1000 (EST)
To: cshankey@reinvent.com
Cc: John Graybeal <graybeal@mbari.org>, Michael F Uschold <uschold@gmail.com>, semantic-web@w3.org, aldo gangemi <aldo.gangemi@gmail.com>, Peter Mika <pmika@yahoo-inc.com>, Ora Lassila <ora.lassila@nokia.com>, "Dr Jeff Z. Pan" <jeff.z.pan@abdn.ac.uk>, Tim Berners-Lee <timbl@csail.mit.edu>, Frank van Harmelen <Frank.van.Harmelen@cs.vu.nl>, sean bechhofer <sean.bechhofer@manchester.ac.uk>, michaelalang@gmail.com, "Michael Lang(Jr.)" <michaelallenlang@gmail.com>
Message-ID: <3216695.61225833059031.JavaMail.peter@Macintosh-2.local>
Hi Conor,

I don't disagree that versioning will be important in specific scenarios, but I disagree that web-based distributed reasoning will ever be used to drive mission critical systems, simply because of the authority distinction. People rely on the web for arbitrary services like email and social networking, but they don't rely on external websites as data sources for their corporate software, or if they do their shareholders should be entitled to know that their risk model is very diverse. Skipping from data-only sources to programmatic direction services requires another set of risk profiles about what could go wrong that noone in the company could respond to.

Of the scenarios that you describe I am only familiar with the bio issue. I should hope that people never setup systems that rely on specific protein related assertions by scientists to be the absolute reality. One place that reality departs from scientific knowledge is in the individual patient, and at minimum any medical systems should be trained on ranges of data and be offered as suggestions at best. I agree that the trained models however should be versioned, but that is easy as they are under the control of the organisation who is performing the diagnosis (ie, providing the application and data combination) and they can implement the change without extensive general community consultation. 

If someone decides to tear up the model then they should do it on test servers, as with any other non-production system in the history of programming :) . They can do it on the test server because everything they directly rely on is under their control, as they process the external data and verify it in a testing phase before utilising it. The brittleness of the application rules won't be as big a factor if the resources are under the groups control and are merely based on external assertions for source data. Because we are focusing on getting things stable at the application level it is not a hassle to contain the versions, as there will only ever be two, (or three) around in production use that are the most current. Any others will be supported in limited ways like any other outdated computer system. If it gets to the stage where the data and ontologies are distributed as a package then organisations could hypothetically be able to utilise the system with its inherent bugs even if the company (foolishly) decided to migrate future product offerings to a new incompatible version using the current URI|UID combinations, but this doesn't require a large degree of versioning expertise as it is a switchover point where you need expert human advice for the data migration stage anyway and the outdated version will not be utilised after the switchover.

The main bit I was disagreeing with in my email was with the emphasis on the distributed web having to be versioned to support particular applications when it is clear that the applications will always be brittle, and hence they should rely on data that has been verified and is contextually static, at least in representation format, and under the control of the organisation running the application. Web URI's are great for resolution and distributing the burden, but if you are financially reliant on the resource then you should be allowed to have it locally cached for stability so that you can manage the versioning internally. This applies to both data and ontologies as they both have places where they can break a semantic rule directed program. Keeping completely up to date with external affairs doesn't seem useful given the potential for breakages at so many levels. (To name one breakage point I would refer to a certain permanent redirection service being overused and effectively able to completely break programs which rely on live data when it goes down. And that has nothing to do with versioning, but it still highlights the falacy of live data programs ever being reliable)

Cheers,

Peter

----- "Conor Shankey" <cshankey@reinvent.com> wrote:

> From: "Conor Shankey" <cshankey@reinvent.com>
> To: "Michael Lang(Jr.)" <michaelallenlang@gmail.com>
> Cc: "Peter Ansell" <ansell.peter@gmail.com>, "John Graybeal" <graybeal@mbari.org>, "Michael F Uschold"
> <uschold@gmail.com>, semantic-web@w3.org, "aldo gangemi" <aldo.gangemi@gmail.com>, "Peter Mika" <pmika@yahoo-inc.com>,
> "Ora Lassila" <ora.lassila@nokia.com>, "Dr Jeff Z. Pan" <jeff.z.pan@abdn.ac.uk>, "Tim Berners-Lee"
> <timbl@csail.mit.edu>, "Frank van Harmelen" <Frank.van.Harmelen@cs.vu.nl>, "sean bechhofer"
> <sean.bechhofer@manchester.ac.uk>, michaelalang@gmail.com
> Sent: Wednesday, 5 November, 2008 4:20:02 AM GMT +10:00 Brisbane
> Subject: Re: URIs and Unique IDs
>
> I strongly disagree that versioning will not be important. I suspect
> that it will become the most profound and challenging problem to
> tackle if we are to scale the application of semantic technology.
> Change management is a less critical in the short term for those
> concerned with the linguistic notion of semantics. However, if you are
> concerned with leveraging semantic models to drive/support high value
> proposition mission critical systems, change management becomes a
> serious concern. Versioning and change management becomes a show
> stopper if you are going even further and intend to create full
> computational semantic systems where the algorithms and data/object
> models of software systems are replaced by semantic models. In each
> one of these three areas the level of trust and dependencies on the
> asserted semantics will become critical.
> 
> Here are a few examples:
> 
> 1. Trust semantic models or ontologies to support operational/mission
> systems such as:
> a. Equipment, system maintenance applications
> - an knowledge modeler/ontologists asserted that a General Electric
> A877623 is a subclass of a Turbo Prop Engine and then in a later
> version realizes their mistake that it is a subclass of another
> system. The difference affects the scheduling of maintenance for
> aircraft.
> - a similar model asserts that a system should be overhauled if a
> certain condition occurs
> b. Operational policies and compliance applications
> - a knowledge modelers asserts that a person who approve a credit
> rating cannot approve a loan but in a later version of the compliance
> ontology realizes that the semantics need to be far more
> sophisticated. The difference affects the ability of the compliance
> system to prevent or permit fraud.
> c. Medical / Bio applications
> - A bio medical ontologists asserts that one protein up-regulates a
> gene. Another subject matter expert asserts that the same protein down
> regulates a gene. Another researchers realizes that it is important to
> tear down the model and express the context of the scenario to capture
> the conflict. The difference affects the ability of a medical
> diagnostic system.
> d. Intelligence systems
> - The model of a social / economic network for terrorist in one model
> needs to be advanced to not to create millions of false positives.
> e. Any other system that dreams of integrating vast amounts of subject
> matter expertise and organizing into something more sophisticated and
> operational than just a categorization system, dictionary or primitive
> taxonomy.
> 
> 2. Simple, but ontologies/semantic models with massive adoption
> a. In one popular social networking ontology the class Person is used
> by millions of people. Later it becomes critical to redefine the class
> as a subclass of Social Contact in order to differentiate from the
> animal or physical notion of Person in another widely used ontology.
> 
> 3. In the longer term vision, semantic technology Drive model driven /
> ontology driven software systems
> a. Declarative, rich semantic models that explicitly describe the
> behavour of parts or every aspect of a functional software system.
> b. Models that explicitly express the compatibility semantics between
> one software system and another so that software systems actually
> understand their purpose and functionality.
> 
> Systems that are more concerned with the NLP or the linguistic notion
> of "semantics" are currently a little bit more resilient to change
> management because their application tend to use statistics or
> approximation to create value. Example applications would be sense
> disambiguation for advertising, entity extraction, etc.. For these
> systems machine learning can help us cope with a lot of
> inconsistencies in semantic models. However, as these systems will
> become more mission critical and the rationalization and harmonization
> of semantics between various ontologies will start to become a serious
> economic issue. Using the right version of various semantic models
> (such as Wordnet, DBPedia, etc..) will become a very challenging and
> painful problem. This latter area is a significant concern and area of
> effort/management right now.
> 
> The power of semantics can permit us to formally express and share the
> semantics of things explicitly or implicitly. This can ultimately help
> to actually get a grip on the ugly world of change management.
> However, in the short term it will open a Pandora's box of power and
> change management problems.
> 
> Conor
> 
> Conor Shankey
> CTO
> Reinvent, Inc - Vancouver.com
> www.Reinvent.com
> www.Vancouver.com
> 
> Michael Lang(Jr.) wrote:
> 
> Peter,
> 
> 
> I agree 100% with your assessment. In the semantic web world, I
> believe that versioning will not be very important. I think a major
> benefit of using semantic web technologies is that you can build an
> application that will adapt to changes in the semantics of a word as
> the semantics change in the real world.
> 
> 
> But, as you said, there may be cases where, at a significant point in
> time, a community would like to version its vocabulary. The goal of
> this discussion is simply to develop some guidelines for versioning,
> when it is necessary, that will make the transition from a past
> version of a vocabulary to an new one as easy, accurate, and flexible
> as possible for the users of a vocabulary.
> 
> 
> Mike Lang
> 
> 
> On Tue, Nov 4, 2008 at 1:41 AM, Peter Ansell < ansell.peter@gmail.com
> > wrote:
> 
> 
> 
> 
> ----- "John Graybeal" < graybeal@mbari.org > wrote:
> 
> > On Nov 3, 2008, at 10:48 AM, Michael Lang(Jr.) wrote:
> >
> > > I strongly believe (and it seems that you and John agree) that if
> a
> >
> > > UID for a concept changes, the old version must have some way of
> > > pointing to the new version.
> >
> > Funny, I would have said this the other way around (new points back
> to
> > old, then the system services can provide the old -> new capability
> --
> > or is this what you are saying too?). I have this notion that *any*
> > change to a static resource's specifications -- definition,
> metadata,
> > semantics -- makes a new resource (this lets me compare resource_new
> > to resource_old and see the difference between them unambiguously).
> >
> > With this vision, the resource can't change once it is created, even
> > to point to a new resource (you see the problem). Is this vision
> just
> > plain wrong, per the consensus?
> 
> Should we really focus on a "ya just never know, do ya" philosophy
> that hurts the majority of casual users more than it helps the
> specialised users? If you make up a system where you require that
> people manually migrate all their past statements in order to use the
> system in a months time then you won't be looked upon too favourably.
> And if you give them the choice to mass migrate their statements then
> what is the point if they always select "migrate all to most current
> versions"?
> 
> This is a very radical discussion that I don't think fits the majority
> of use cases that the semantic web will be applied to, as it is
> decidedly anti Web-2.0 where there is a constant evolution and links
> are relative, not static as in Web-1.0. If you really face it, meaning
> migrates, and the particular structure at a given instant in time
> isn't as relevant as the improvement in meaning anyway. If rules in
> the semantic web are completely reliant on data structures and unable
> to recognise the overall meaning that people gradually migrate towards
> then they are always going to be brittle, whether people are perfectly
> pedantic about UID's and/or URI's or whether they end up referencing
> everything with relative addresses which don't focus on particular
> representations at particular points in time.
> 
> It isn't bad to version information at significant points in time, but
> the archaic once-published-always-published-never-modified culture
> doesn't fit with electronic technologies IMO.
> 
> (Just a few thoughts :) )
> 
> Cheers,
> 
> Peter
> 
> 
> 
> --
> Revelytix, Inc.
> 
> phone: 410-584-0009 (office)
> 443-928-3782 (cell)
> skype: michael.allen.lang.jr
> aim: MikeJrRevelytix
Received on Tuesday, 4 November 2008 22:59:28 UTC