Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1) from Alan Ruttenberg on 2007-07-20 (public-semweb-lifesci@w3.org from July 2007)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Fri, 20 Jul 2007 01:15:43 -0400
To: Eric Jain <Eric.Jain@isb-sib.ch>
Cc: Chris Mungall <cjm@fruitfly.org>, Bijan Parsia <bparsia@cs.man.ac.uk>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>, Darren Natale <dan5@georgetown.edu>
Message-Id: <C66E7A1D-77A6-4EF0-9385-BD3A97D84F74@gmail.com>

On Jul 19, 2007, at 4:16 AM, Eric Jain wrote:

> Alan Ruttenberg wrote:
>> In that case, I would recommend  that it is unwise to use Uniprot  
>> ids as identifiers of protein classes on the semantic web. Doing  
>> so would encourage exactly the kind of ambiguity that we need to  
>> avoid in order to write statements that will not confuse semantic  
>> web agents (including people).
>
> The question you need to ask yourself here is whether there really  
> are such things as specific proteins, or if this is always just a  
> useful abstraction (and often a fuzzy one at that, if it wants to  
> make sense for biologists).

There's something odd about this statement. So let me try to rephrase  
in a way which hopefully makes it clearer how I am thinking.
I consider a specific protein to be an instance of a molecule - some  
very tiny piece of stuff composed of a bunch of atoms bound together.  
So yes, I really believe that there are things such as specific  
proteins.

Then there are protein classes, which identify some set of those  
instances. Those protein classes can be defined in a variety of ways.  
Some of those ways will be such that a protein might be an instance  
of more than one of these classes.

When you are saying "specific proteins" I think you are actually  
talking about there being something like there being one "true"  
disjoint and covering set of classes into which each protein can be  
placed. The answer to that question would be I don't know and I'm not  
sure whether I care. What I really care about is being able to  
specify what sets of things I am making general statements about,  
having a way to evaluate whether or not *I* believe them to be true  
or consistent with other statements, and to then encode them in such  
a way that my computer can help me work with a large number of such  
statements to help make progress on some scientific problem.

> UniProt has a different idea on what exactly the protein-related  
> entities are than e.g. EMBL_CDS, and others have different ideas,  
> too. Even if you came up with your own protein database that is  
> more suitable for Semantic Web applications because it has better  
> explicit definitions than UniProt manages to have at the moment, I  
> could argue that what you have in the end are nothing but "records,  
> too...

You could argue that, but I'm not sure that it would be very  
illuminating. The difference between the Uniprot records and the  
records that I want to use is that they are used by different sorts  
of computer programs. In the one case the computer can evaluate the  
contents of the record in such a way as to check consistency, compute  
entailments etc. In the other not.

If I have the source code for some program, I can certainly say that  
it can be considered a string. But saying a string doesn't capture  
the fact that it can be interpreted in a certain way to control a  
computation, and I would give that entity a different name and type  
than those that I would give to a string that was not interpretable  
in that way.

All of what we manipulate on computers are in some way records/ 
strings of bits. However saying that doesn't really capture what we  
need to understand the consequences of what we and the computers are  
doing. When I digitally sign a contract, and I later breach it, I can  
be sued. Do you not think that there is something associated with  
those particular bits is different sort of thing than the something  
associated with the bits behind http://www.miniclip.com/games/sling/ 
en/ ?

-Alan

Received on Friday, 20 July 2007 05:16:14 UTC