Re: URIs and Unique IDs from John Graybeal on 2008-11-27 (semantic-web@w3.org from November 2008)

From: John Graybeal <graybeal@mbari.org>
Date: Wed, 26 Nov 2008 16:45:33 -0800
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: "Peter Ansell" <ansell.peter@gmail.com>, semantic-web@w3.org, "aldo gangemi" <aldo.gangemi@gmail.com>, "Conor Shankey" <cshankey@reinvent.com>, "Peter Mika" <pmika@yahoo-inc.com>, "Ora Lassila" <ora.lassila@nokia.com>, "Dr Jeff Z. Pan" <jeff.z.pan@abdn.ac.uk>, "Tim Berners-Lee" <timbl@csail.mit.edu>, "Frank van Harmelen" <Frank.van.Harmelen@cs.vu.nl>, "sean bechhofer" <sean.bechhofer@manchester.ac.uk>, obo-format@lists.sourceforge.net, "Michael F Uschold" <uschold@gmail.com>
Message-Id: <05EDB0EB-C422-40CA-933F-2F6B4E48B191@mbari.org>

Correct in every respect but the 'you' part. :->  I would reframe the  
situation as follows: I am working with people who define the terms  
used for their data sets. Those people are (a) unfamiliar with  
ontologies, and (b) unaware of 'meaning creep' even as they are  
creating it via their revised definitions.  I'm not sure if these  
details affect your assessment, but this application, as described  
below, seems very different to me than "I want to supervise the  
creation of a community ontology for a domain."

Rather, I want to use ontological methods to let community members use  
or refer to (relatively) unambiguous terms -- terms that they have  
defined, possibly with a large community, and possibly that they have  
mapped to other concepts. They are not building rich domain  
ontologies, merely capturing their local or community understanding.  
So I'm thinking my choices are either: (a) assign them a new opaque  
URI assuming the new concept is different, or (b) give them a  
versioned URI in case the new concept is different.

The resource we are working with, as I see it, is this term of  
convenience (in this example, 'sea surface temperature'). They will  
define the term differently at different times. But the fact that  
their understanding of the concept has changed due to underlying  
technologies (even though to them the concept is 'unchanged' -- note  
that often the person updating the definition is not in a position to  
decide if the change is in fact significant) is very important

I picked this example from a relatively stable science domain, but of  
course science domains in flux have faster meaning creep, as  
understanding evolves. Whether these actual changes are managed by  
changing opaque URIs, or changing version URIs while keeping a stable  
term fragment, should not impact how the semantic technologies can be  
applied. Or so I hope.

John

On Nov 26, 2008, at 2:51 PM, Alan Ruttenberg wrote:

> On Wed, Nov 26, 2008 at 5:17 PM, John Graybeal <graybeal@mbari.org>  
> wrote:
>
>> In our research community, on a regular basis you hear how  
>> important it is
>> to know, as precisely as possible, the meaning of the parameters in a
>> historical data collection. Whether or not "sea surface  
>> temperature" meant
>> the temperature of water "collected somewhere near the surface and  
>> brought
>> back on board", "measured in situ 1 meter below the surface", or  
>> "measured
>> by a satellite" can significantly impact the temperature trend of a  
>> global
>> ocean temperature analysis.
>>
>> Before you jump:  I appreciate that fundamentally these can be 3  
>> different
>> concepts. My observation is that the people defining the terms  
>> don't always
>> appreciate that; and simply letting a concept evolve, without  
>> tracking or
>> versioning the evolution, will obviously produce analyses in the  
>> future that
>> say "We don't know which version of the concept they had in mind  
>> when they
>> labeled this data value."   Tracking the necessary information to  
>> answer
>> questions like that is a minimal requirement for supporting  
>> historical data
>> analyses for environmental science.  For me, that's a decisive  
>> argument for
>> versioning.
>
> I see this as an argument for better modeling, not versioning. But
> first let me see if I understand the scenario.
>
> You want to define sea surface temperature. There are a number of
> methods for doing so. You are proposing to have a single class
> (relation?) "sea surface temperature" that is versioned as follows:
>
> "sea surface temperature"
>  v1: temperature of water "collected somewhere near the surface and
> brought back on board
>  v2: measured in situ 1 meter below the surface
>  v3: measured by a satellite
>
> Your presumption is that for a while people will use v1, then they
> will use v2 then they will use v3 and therefore you will know what
> they mean in each case.
>
> Do I understand this correctly?
>
> -Alan
>
>>
>> John
>>
>> On Nov 9, 2008, at 10:41 PM, Alan Ruttenberg wrote:
>>
>>> On Sun, Nov 9, 2008 at 9:18 PM, Peter Ansell  
>>> <ansell.peter@gmail.com>
>>> wrote:
>>>>
>>>> ----- "Alan Ruttenberg" <alanruttenberg@gmail.com> wrote:
>>>>>
>>>>> The OBO ontologies are moving towards *all* URI being numeric id  
>>>>> based
>>>>> for this reason (until recently it had only been classes that were
>>>>> named that way).
>>>>
>>>> How will people using OBO ever be sure that they aren't going to  
>>>> use a
>>>> term thinking it doesn't have reaching consequences like the
>>>> broader->broaderTransitive difference and find out in future that  
>>>> it has
>>>> changed and influenced their results in some way when someone could
>>>> reasonably have determined that the nature of the term had  
>>>> changed and it
>>>> needed a new number/name/URI/UID. I do recognise that whenever  
>>>> any property
>>>> attached to a term changes that technically there could be a  
>>>> difference in
>>>> the results of some application utilising the data, but reverting  
>>>> to saying
>>>> that things just migrate on the spot always isn't a suitable  
>>>> solution either
>>>> IMO.
>>>
>>> Nobody can be sure of anything. However their policy has been  
>>> arrived
>>> at over many years of practice of arguably the most successful
>>> collaboratively built ontology in history. If I had to make a  
>>> wager, I
>>> wouldn't bet against the solution they've come up with without a
>>> really good case for it.
>>>
>>> <snip>
>>> Bottom line is that there is a decent amount of experience that  
>>> leads
>>> to a conclusion of being very hesitant before changing ids. If you
>>> have some experience to share that demonstrates otherwise I'm very
>>> interested in hearing the specifics. I think we could do with more
>>> case studies and fewer first principles here.
>>>
>>> Regards,
>>>
>>> -Alan
>>>
>>
>>
>>

John

--------------
John Graybeal   <mailto:graybeal@mbari.org>  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org

Received on Thursday, 27 November 2008 02:05:00 UTC