Re: Advancing translational research with the Semantic Web from William Bug on 2007-05-18 (public-semweb-lifesci@w3.org from May 2007)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Fri, 18 May 2007 02:36:03 -0400
To: public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
Message-Id: <CD14CFA1-4C27-4870-9995-5D3EE435B16E@DrexelMed.edu>
Alan does an excellent job summing up how the issues discussed in  
this thread will ultimately need to be brought to bear in asserting  
the details of biological reality in such a way that algorithms will  
be able to assist us in reliably inferring new, MEANINGFUL relations.

As he states, there are ways in which a protein can end up in a  
specific tissue other than being expressed by the cells in that tissue.

In fact, I'd go one further and say there will be times when the  
combination of asserted and inferred relations will need to represent  
the location of an instance of protein X - and the process(es) via  
which it became located there - at multiple levels of resolution -  
e.g., in an instance of a specific tissue A, in an instance of  
specific cells in that tissue, in an instance of specific a sub- 
cellular compartment in those cells, in a particular mereotopological  
relation to instances of other protein classes in that sub-cellular  
compartment.

There will also be applications where we'll need to represent both  
the processes by which an instance of protein X ended up in a  
specific location, the process(es) in which it participated along the  
way (and at its final destination), and express how the instances of  
the objects participating in the instances of those processes evolved  
through time.

I know this may seem overly complex, but you could pick up virtually  
any research article reporting a novel finding in biomedical science  
- from the behavior of some set of organisms in an ecosystem to the  
behavior of some set of atoms in a GC/Mass Spec device where that  
seeming complexity is dealt with as a commonplace.

If we expect the application of formal semantic informatic techniques  
to yield the manner of novelty that has accrued through use of linear  
pattern discovery techniques in the biomolecular informatics  
community (e.g., sequence homologies, hydrophobicity profiles, gene  
finding, algorithmic probe set construction, restriction fragment re- 
assembly, etc.), we'll need to encapsulate this manner of complexity  
in our representations of biological reality.

Documenting associated provenance information for the statements  -  
both the asserted and the inferred statements - is obviously a  
critical part of this process (as has been stated often by many on  
this list - and has been pursued in systems such as SWAN and others)  
- both to accommodate the required disagreement amongst authorities,  
as well as to classify the statements in order to perform further  
analysis - e.g., in examining the binding of ligands to receptors,  
there will be situations where one will want to restrict the  
inferencing/analysis to those statements derived from ligand-receptor  
interactions that lead to functional consequences and for which there  
is corroborating evidence from a functional assay - in other words,  
not just statements such as "an instance of ligand X bound to an  
instance of receptor Y", but an "an instance of ligand X bound to an  
instance of receptor Y leading to consequence Z" (e.g., increased  
intracellular Ca++, activation of Protein Kinase A, more frequent  
openning of I.K.A ion channels, etc.), where the evidence = some  
functional assay for consequence Z.

One might also want to restrict your analysis to statements made  
about instances in public data repositories (as opposed to statements  
derived from instances in a literature databases) to determine  
whether the inferable statements match those in the literature based  
on analysis of the same collection of experimental results.

Cheers,
Bill

On May 17, 2007, at 11:07 PM, Alan Ruttenberg wrote:

>
> On May 17, 2007, at 6:34 PM, Eric Jain wrote:
>
>> There does indeed seem to be an existing has_participant  
>> predicate, but is there also a "protein expression process" class?  
>> This would seem rather contrived, from a biologists (if not an  
>> ontologists) point of view (all we want to say, after all, is that  
>> the protein can be found in some tissue)!
>
> If you want to say that the protein is found in some tissue, that's  
> what should be said. However, in your email you wrote that the  
> protein is expressed in the tissue. They are not the same, and I  
> think that in our semweb representations we should take care to not  
> confuse them, though in language they are easily interchanged and  
> we still (often) understand what each other is talking about.
>
> If it is know to be found in the tissue I would make the subclass  
> be the subclass of the protein each instance of which  is located  
> in some instance of the tissue. No processes involved at all.
>
>> Using widely used concepts and predicates is no doubt a good  
>> thing. But if you can instead make do with core RDF features,  
>> that's even better -- not everyone uses OBO, no matter how  
>> "foundational" it may be :-)
>
> I don't think we can make due with core RDF features, if we want to  
> have agents that make reasonably inferences based on what they are  
> told. RDF is just too weak to do much of anything in this  
> direction. OTOH, if the RDF is always going to be interpreted by a  
> human - essentially you are using RDF as an opaque (from a machine  
> agent point of view) syntax, then there is no problem. I guess I am  
> hoping my machines to help me more than that.
>
>> Note that the reification "design pattern" allows you to add  
>> attribution information on statements that you did not at first  
>> think would ever need such information, without breaking the data  
>> model.
>
> As long as those statements are single triples. It gets more  
> involved when statements are more than a single triple, as they  
> often will be.
>
> -Alan
>



Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu
Received on Friday, 18 May 2007 06:33:45 UTC