Re: Less strong equivalences (was Re: blog: semantic dissonance in uniprot)

Pat,
Basically I'm in agreement with all of your points, but need to correct some
mis-interpretations you made of my comments...

On Thu, Mar 26, 2009 at 3:13 AM, Pat Hayes <phayes@ihmc.us> wrote:

>
> On Mar 25, 2009, at 5:27 PM, eric neumann wrote:
>
> Several different issues here.
>
>
>
> On Wed, Mar 25, 2009 at 5:47 PM, Bijan Parsia <bparsia@cs.manchester.ac.uk
> > wrote:
>
>> Eric,
>>
>> Thanks for the use case!
>>
>> On 25 Mar 2009, at 21:31, eric neumann wrote:
>>
> <snip>
>
>>
>>
>>  This is the kind of "similar" used in most internal genomic/compound
>>> systems...
>>>
>>> <http://myOrg.com/sw/mxid/PHLP0005>  :isIdentifiedwith  <
>>> http://www.uniprot.org/uniprot/P16233>
>>>
>>
>> Can you explicate this a bit more for me? I.e., could you present what you
>> expect this to do or not do?
>>
>
> Certainly... I want look up what myOrg knows about a uniprot protein, but
> since they do their own internal data-keeping on things like "druggability"
> which aren't included (yet) in uniprot, I need to make sure my extra data is
> mapped to the public protein object.
>
> Does this help you?
>
>
> It doesn't help me. We need to have a 'semantic' answer. What kinds of
> thing are being talked about here? What do the URIs refer to? (Records or
> chemicals?) Because the use of sameAs depends on the answer to this question
> very crucially.
>

In my company I have a ProteinDictionary table populated will all 'known
human proteins' (this is the conceptual part that is easy for all
biologists, but is causing some confusion in the thread); each entry is
identified (sameAs?) with a protein in Uniprot (as well as a protein in
NCBI-Entrez)

In ProteinDictionary I include a lot of additional data (not found
in Uniprot) on what antibodies exists for that protein (structure) .
Therefore, the records "refer" to the same protein, but do not have
identical properties My company has more knowledge about the protein, but it
is not common to everyone; case of Open World assumptions...

Is that clearer?

>
> (Of course, in a SW world this could have all been done with internal
> triples added to the uniprot URI locally...)
>
>
>>
>>  It really isn't probabilistic anymore since the scientists have all
>>> agreed and defined their entry based on some of the info from the public
>>> entity; for most situations it is an 'exact mapping' to the referred
>>> molecules.
>>>
>>
>> Is it that most, but not all of the time, you can treat is as sameAs but
>> sometimes you don't want to?
>
>
> Well, the question we ask of experts like you is: should we are should we
> not use owl:sameAs for exact mappings to entities with different records?
>
>
> If your URIs are referring to the entities, then use sameAs when you are
> sure you are talking about the same entity, no matter what your records say
> about it. If they are referring to the records, then I would guess that
> sameAs would be true only when two URIs resolve to the same resource using
> GET.
>

If we all agree we are referring to the protein in question, but the Uniprot
and Entrez URIs may have different (hopefully consistent up to open-world
assumptions) information.

>
>
>>
>>
>>  I agree owl:sameAs was not intended for this kind of relation, but is is
>>> extremely common, and a specialized relation for this would be very much
>>> desired. : )
>>>
>>
>> We need to make me understand the relation :)
>
>
> There are other "identiity" or "similar" relations
>
>
> Braaagh! Semantic alarm!  Identity is NOT similarity. Identity really does
> mean being EXACTLY the same thing. If A similarTo B, then we are talking
> about two things which are similar. If A sameAs B, then we are talking about
> ONE THING which happens to have two names.
>

I did not intend to equate   "identity" and "similar"; they usually come up
as a bundle in chem and bio discussions like:
Does Person A have this exact sequence variant V for Gene G, or something
similar but distinct, or is their gene allele completely rearranged
(radically altered)?

> in mol biology:
>
> - homolog (symmetric) ; similar function in different species
> - paralog (symmetric, sub-property of homolog )  ; similar origin
> duplication in same species
> - ortholog (symmetric; sub-property of homolog)  ; similar function in
> different species
>
>
> None of these are identity.
>

Agree, as was my intent to show forms of similarity, not identity.


> (also Ohnology and Xenology, see
> http://en.wikipedia.org/wiki/Homology_(biology))
> - variant of (a non-subsumptive form of specialization within genes)
> - modified form of  (a non-subsumptive form of specialization for protein
> gene products), includes splice variants (see
> http://www.affymetrix.com/community/publications/affymetrix/tmsplice/index.affx
> )
> - similar chem structures (symmetric for compounds)
>
>
> None of these are identity.
>

Again, Agree...

>
> One way to use identity here is to try to map the original things to a
> 'sort' or 'similarity class' or similarity type' or <choose your own
> buzzword>, and then use identity reasoning on these 'types'. So [ A
> similar-to B] is glossed as [(similarity-type A) sameAs (similarity-type B)]
>  but this only takes you so far: you still get transitivity, for example, so
> notions like 'very close' don't work this way. Still, it might be one way to
> approach the issue.
>
>
>
> ... I'm sure there a re dozens more.
>
>
>>
>>  Remember also, even though these URIs may be of instances in terms of
>>> records,
>>>
>>
>> instances of what?
>
>
> For a "collective grouping" of similar instances of (physical) molecules...
> d-glucose is 'a' specific molecular structure, but there are over 10^25 of
> glucose molecules in a teaspoon of dextrose sweetener.... Not the usual OWL
> concept of "instance of class Molecule" is it?
>
>
> This is just a basic ontology issue. You need to distinguish a particular
> molecule from a molecular 'pattern' from a class of isomers, etc.., BUt you
> can;'t expect OWL to do all this kind of work for you automatically.
>

Certainly, but how best should we apply OWL so that this can be well
represented? Dare we promote meta-classing at this point? I'd rather use OWL
to accurately represent "a Molecule Class means this...., and an instance
means that ...." whether its structure patterns, property groupings, or
mind-conceptual objects ("I can create a specific and novel chemical with
this structure and these properties")


...If this discussion is beginning to settle onto a commonly agreed set of
principles, I'd like to suggest we capture it and circulate for comment,
perhaps through HCLS.

cheers,
-Eric


> Defining 'glucose' as a Class just pushes the definition of Molecule up to
> become more akin to a meta-Class...
>
>
> Right, exactly. Classes weren't meant to carry this kind of conceptual
> load. You will just have to do some real ontologizing, my friend :-)
>
>
>
>>
>>  the molecule referenced is not really "a specific single molecule" found
>>> in nature (conceptually possible, but never thought of this way in may
>>> experience). In fact, this is almost always the case in molecular biology
>>> (genes, genomes, SNPs, proteins, etc), while when dealing with macro-humans,
>>> we can refer to an exact instance in the real world.
>>>
>>
>> We cannot?
>
>
> No one in pharma is interested in mapping URIs to an individual exact,
> physical molecule; IP is always around the chemical structure (which IS
> unique) rather than the molecule.
>
>
> Good: you have a clear ontology and a clear identity criterion for sameAs.
> You are talking about chemical structures. I'd suggest, if you really want
> to talk about molecules, having properties has_chemical_structure (domain:
> molecule; range; chemstruct) and is_a_molecule_of as its inverse. Don't use
> the class structure for Avogadro.
>
> Pat Hayes
>
>
>>
>>  Perhaps we really need a set of basic relations (and meta classing?) for
>>> this scale of scientific phenomena to keep it distinct from organism
>>> examples in clinical studies and experiments...
>>>
>>
>> I suspect there's more weight on "exemplar" than I know how to give at the
>> moment :)
>
>
> Well, try keeping a URI tracking a single molecule-- there's no business
> value in that! ; )
>
> Eric
>
>
>>
>>
>> Cheers,
>> Bijan.
>>
>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>

Received on Thursday, 26 March 2009 16:29:30 UTC