W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > March 2009

Re: blog: semantic dissonance in uniprot

From: Oliver Ruebenacker <curoli@gmail.com>
Date: Tue, 24 Mar 2009 12:03:39 -0400
Message-ID: <5639badd0903240903x4c605740yc705ef87a5b12062@mail.gmail.com>
To: Phillip Lord <phillip.lord@newcastle.ac.uk>
Cc: Michel_Dumontier <Michel_Dumontier@carleton.ca>, David Booth <david@dbooth.org>, W3C HCLSIG hcls <public-semweb-lifesci@w3.org>
     Hello Bijan, Philip, All,

  I understand that what you are referring to are real problems, but I
wonder why the most accurate way to characterize them is as "confusion
between a thing and its record".

  It sounds as if there was a person who would one day exclaim "Oh my
God! I always thought this is a reference to a thing! But no, it is a
reference to a record of the thing! Now I'm doomed!"

  Does this ever happen?

  Is it possible that referring to records instead of things is not
the result of confusion, but rather of cost-benefit considerations -
that records are cheap and identification is costly and open-ended?
What is it that can not be achieved by having better records instead?

  And what does it take to identify something? We may have thought we
know what a couch is, until we realize that we have no consensus over
whether the pillows are part of the couch or not, and that it would be
more accurate to distinguish between bare couches (without pillows)
and fully featured couches (with pillows). How far are we going to go?

     Take care

On Tue, Mar 24, 2009 at 8:10 AM, Phillip Lord
<phillip.lord@newcastle.ac.uk> wrote:
> Oliver Ruebenacker <curoli@gmail.com> writes:
>> 2009/3/23 Michel_Dumontier <Michel_Dumontier@carleton.ca>:
>>> I do not think this would be a wise "simplification".  This is only a
>>> simplification from one perspective: because it avoids having to mint
>>> and maintain pairs of URIs instead of a single URI.  But the downstream
>>> cost is that it creates an ambiguity (or "URI collision")
>>> http://www.w3.org/TR/webarch/#URI-collision
>>> that may cause trouble and be difficult to untangle later as the data is
>>> used in more and more ways.  For example, if any of the same predicates
>>> need to be used on both the record and the molecular entity, they will
>>> become hopelessly confused.  Also, if disjointness assertions are
>>> included then this overloading may cause logical contraditions.
>>   Can any one name a real world example of where confusion between an
>> entity and its record was issue?
> Yes, sure. All proteins have a Uniprot ID (conflating protein and
> uniprot records). Then we integrate this with drugbank; this represents
> many things including proteins which are not in Uniprot, or represents
> several proteins where Uniprot has one. Consider insulin for instance.
> We now have a problem because not all proteins have a Uniprot ID.
> The flip side is that if you always say
> Protein Record --> contains knowledge about --> protein
> it's much more complicated. You are making your data model more
> difficult to work with all of the time, to cope with edge cases which
> occur only some of the time.
> There's no way around this; either way it's a compromise and what is
> good in one context may not be good in another.
> Phil

Oliver Ruebenacker, Computational Cell Biologist
BioPAX Integration at Virtual Cell (http://vcell.org/biopax)
Center for Cell Analysis and Modeling
Received on Tuesday, 24 March 2009 16:04:14 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:20:41 UTC