Re: Ambiguous names. was: Re: URL +1, LSID -1 from Matthias Samwald on 2007-07-17 (public-semweb-lifesci@w3.org from July 2007)

From: Matthias Samwald <samwald@gmx.at>
Date: Mon, 16 Jul 2007 19:05:52 -0500
To: <public-semweb-lifesci@w3.org>
Message-ID: <200771619552.218937@cqueberel>

>�It would be more satisfying for us to know intentionally what we
>�mean by "protein". It would be good to have a clear set of
>�definitions. But, ultimately, I think it would be mistaken. If we
>�have the ability to express "the class of protein molecules defined
>�by the swissprot record OPSD_HUMAN", then I think we have all we
>�need.

As you might know, this is what was done for the Banff demo. We did it this way because the current version of Uniprot is based on abstract database entries. However, if we were able to create a new Uniprot including Semantic Web representations from scratch, would we still want to take the detour of describing a class of proteins by referring to this abstraction? Why not just extract the biological information from the Uniprot record and use it to create direct biological statements about the class of proteins itself?

OWL is very open towards incomplete information. If all we know about the protein is the sequence of amino acids, than this is what we add to the protein class through a 'some-values-from, necessary' property restriction (and not 'necessary and sufficient', since we are still unsure if this information alone is enough to DEFINE the protein class). If we know that proteins of this class can have some polymorphisms, we can enumerate the different possible sequences as best as we can. If we are unable to enumerate all of them at the moment, or are unsure about something, we just leave it out and maybe add it later.

Of course, I am not telling you anything new, but I just want to bring back to attention that describing biological entities instead of abstract database records is much easier than some seem to be thinking. 

-- Matthias

Received on Monday, 16 July 2007 17:06:06 UTC