Re: BioRDF Telcon

Hi Michael,

Our use case is considered a pilot project for exploring how to use 
semantic web to represent some of the information about microarray 
experiments including co-expressed/differentially expressed genes and 
the context of how such genes are identified (as described in papers). 
While keeping things simple and well defined (based on a limited number 
of examples), we hope to demonstrate how such information/knowledge on 
the semantic web can help researchers locate microarray datasets more 
easily. For example, users may be interested in collecting (from 
different databases) raw datasets belonging to different microarray 
experiments using a particular microarray platform (e.g., Affymetrix) to 
study Alzheimer Disease (AD) for particular neural cell types and brain 
regions, Such a collection may help researchers (biostatisticians) 
perform meta analysis to identify biomarkers for a given stage of AD, 
for example. Also, please see my response below.

Michael Liebman wrote:

>I usually monitor this group and don't contribute but seeing the recent
>exchanges about
>Gene expression, I feel a need to put things into a better perspective than
>the one currently
>Being shared
>  
>

I'm glad you contribute.

>My experience comes from many years overseeing bioinformatics (and gene
>expression, proteomics
>And clinical data) at Wyeth, Roche, UPenn and with a DOD center at Windber-
>There appear to be several issues that are not being realistically addressed
>in the current discussion
>  
>
Thanks for the introduction.

>1. there is significant experimental variability across individual studies,
>published or not-
>Because of variation in tissue/cell handling/storage/preparation,
>experimental variability in
>The experiment and significant variability in the data analysis.  i.e.
>experimental reproducibility
>Inter-lab is poor and even intra-lab can be a major challenge
>  
>

I agree with you on the challenge of variability inherent in microarray 
and other high-throughput technologies.

>2. the measurement that is usually referred to as "up or down gene
>expression/regulation" refers to 
>The comparison between 2 experiments (sample under 2 different conditions)
>but typically does not
>Adequately correct for individual experimental variability other than
>"simple" scaling.  We have shown
>That this is inadequate.
>  
>
Yes, that’s why more sophisticated normalization methods have been 
developed to address some of the variability issues.

>3. leaving the interpretation to the author is significantly limited as it
>tends to reflect the bias of
>The author to "observe/confirm" what they are looking for in many of these
>studies- i.e. a biostatistician
>Will tell you that these experiments are extremely under-powered to reveal
>the true statistically significant
>Results they would like to achieve
>  
>
We're not agreeing or disagreeing with the authors. We just want to 
capture the information as described in the paper. We'll let others 
judge the validity of the results presented in the paper.

>4. human nature looks to favor the "big differences" as being most
>significant- unfortunately nature doesn't
>Work this way- many of the largest differences are not functionally relevant
>but reflect the fact that biological
>Control of these specific genes may not be critical to function and so large
>variability can be observed and should
>Not be interpreted, all of the time, as being most significant.  In fact, we
>have developed analytical methods to
>Look at large libraries of gene expression studies and evaluate the overall
>stability/variability of individual
>Genes (and probes) to establish a significance in difference between states
>based on how much variation should be
>Expected vs how much is observed, especially in genes that show extremely
>small levels of expression overall and which
>Would not be considered by typical approaches to data analysis
>  
>
It sounds like your group is developing new methodologies to tackle the 
problem of determing the significance of gene expression.

>Sorry to interrupt the exchange but I believe that it is critical, when
>considering the development of systems to
>Represent, store, exchange, model data, that an understanding of the
>specifics and uniqueness of the underlying
>Data and analytical approaches must be considered beyond simple statistics.
>  
>
No problem. Thanks for the input.

Best,

-Kei

>Michael
>
>Michael N. Liebman, PhD
>President/Managing Director
>Strategic Medicine, Inc
>231 Deepdale Drive
>Kennett Square, PA 19348
>
> (814) 659 5450 mobile
>
>m.liebman@strategicmedicine.com
>www.strategicmedicine.com 
>
>-----Original Message-----
>From: public-semweb-lifesci-request@w3.org
>[mailto:public-semweb-lifesci-request@w3.org] On Behalf Of mdmiller
>Sent: Wednesday, May 26, 2010 1:47 PM
>To: Kei Cheung
>Cc: HCLS
>Subject: Re: BioRDF Telcon
>
>hi kei,
>
>  
>
>>Just want to clarify that what I meant was that it might be beyond the 
>>scope of our use case to accurately, comprehensively, and precisely define
>>    
>>
>
>  
>
>>what gene expression really  mean given the degree of complexity involved.
>>    
>>
>
>exactly, i believe we can trust the authors of the gene expression papers 
>and the journals themselves for this
>
>cheers,
>michael
>
>----- Original Message ----- 
>From: "Kei Cheung" <kei.cheung@yale.edu>
>To: "mdmiller" <mdmiller53@comcast.net>
>Cc: "HCLS" <public-semweb-lifesci@w3.org>
>Sent: Wednesday, May 26, 2010 7:23 AM
>Subject: Re: BioRDF Telcon
>
>
>  
>
>>Hi Michael,
>>
>>mdmiller wrote:
>>    
>>
>>>hi kei,
>>>
>>>      
>>>
>>>>What do we mean by differentially expressed genes? One definition is 
>>>>that differentially  expressed  genes are genes with significantly 
>>>>different expression in two samples/conditions/experimental 
>>>>factors/dimensions (e.g., treated vs. untreated, disease vs, normal, 
>>>>time point1 vs. time point 2) of microarray experiments.
>>>>        
>>>>
>>>yes, this was my meaning.
>>>
>>>this is to differentiate between a gene that is always expressed under 
>>>normal conditions because it is part of an essential pathway that is 
>>>always running, that gene is only interesting if its expression level 
>>>changes--similarly for a normally unexpressed gene.
>>>      
>>>
>>Thanks for confirming. A consensus definition (even it's broad) is 
>>important to our gene list representation. There are a variety of methods 
>>(e.g., statistical tests) that can be used to identify a list of 
>>differentially expressed genes in two different groups. That's Scott's 
>>point about the importance of capturing as part of the genelist context 
>>what methods have been used for detecting differentially expressed genes. 
>>I hope the use case can help convince the community the need/use of a 
>>common vocabulary for describing such methods.
>>    
>>
>>>>How to measure or infer gene expression (e.g., from mRNA) is a whole 
>>>>complex question that may be beyond the scope of our use case.
>>>>        
>>>>
>>>yes, which i think was scott's point in his reply.  in fact, for the 
>>>BioRDF use case, initially at least, it is probably sufficient that the 
>>>authors of the paper state that a gene is part of the significant gene 
>>>list.
>>>      
>>>
>>Just want to clarify that what I meant was that it might be beyond the 
>>scope of our use case to accurately, comprehensively, and precisely define
>>    
>>
>
>  
>
>>what gene expression really  mean given the degree of complexity involved.
>>
>>Cheers,
>>
>>-Kei
>>    
>>
>>>cheers,
>>>michael
>>>
>>>
>>>----- Original Message ----- From: "Kei Cheung" <kei.cheung@yale.edu>
>>>To: "mdmiller" <mdmiller53@comcast.net>
>>>Cc: "M. Scott Marshall" <marshall@science.uva.nl>; "HCLS" 
>>><public-semweb-lifesci@w3.org>
>>>Sent: Tuesday, May 25, 2010 8:52 PM
>>>Subject: Re: BioRDF Telcon
>>>
>>>
>>>      
>>>
>>>>Hi Michael et al,
>>>>
>>>>What do we mean by differentially expressed genes? One definition is 
>>>>that differentially  expressed  genes are genes with significantly 
>>>>different expression in two samples/conditions/experimental 
>>>>factors/dimensions (e.g., treated vs. untreated, disease vs, normal, 
>>>>time point1 vs. time point 2) of microarray experiments.
>>>>
>>>>How to measure or infer gene expression (e.g., from mRNA) is a whole 
>>>>complex question that may be beyond the scope of our use case.
>>>>
>>>>Cheers,
>>>>
>>>>-Kei
>>>>
>>>>mdmiller wrote:
>>>>
>>>>        
>>>>
>>>>>hi scott,
>>>>>
>>>>>i think you, jim and lena are doing a great job moving the technical 
>>>>>aspect of this work forward.  i'm looking forward to seeing the end 
>>>>>results.
>>>>>
>>>>>cheers,
>>>>>michael
>>>>>
>>>>>----- Original Message ----- From: "M. Scott Marshall" 
>>>>><marshall@science.uva.nl>
>>>>>To: "mdmiller" <mdmiller53@comcast.net>
>>>>>Cc: "Kei Cheung" <kei.cheung@yale.edu>; "HCLS" 
>>>>><public-semweb-lifesci@w3.org>
>>>>>Sent: Tuesday, May 25, 2010 10:21 AM
>>>>>Subject: Re: BioRDF Telcon
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>Hi Michael,
>>>>>>
>>>>>>Thanks for the clarification. I also explained those concepts during
>>>>>>the BioRDF teleconference but it is difficult for the scribe to
>>>>>>capture such details accurately from a phone conversation. Just
>>>>>>knowing that a gene has changed (either up or down) already gives us
>>>>>>something to work with. Since we started with the microarray use case,
>>>>>>we have aimed to focus on the list of differentially expressed genes
>>>>>>as our entry point into related molecular information, phenotypes,
>>>>>>pathways, diseases, etc.
>>>>>>
>>>>>>In addition to the gene list and experimental factors, there is some
>>>>>>data provenance information that characterizes the origins of the gene
>>>>>>list, such as the type of significant analysis or technique that was
>>>>>>performed (ANOVA, LIMMA, ..) and p-value cutoff for the list discussed
>>>>>>in the associated article(s), software packages used (specific R
>>>>>>package from BioConductor, GeneSpring, NextBio, ..). It would be handy
>>>>>>if there was a common vocabulary for this type of information (URI's
>>>>>>for statistical techniques and software packages). I think that some
>>>>>>related resources have been described by myGrid/myExperiment. However,
>>>>>>lacking a complete vocabulary, it is still possible to make use of the
>>>>>>gene list without such a fine grained description of its provenance.
>>>>>>
>>>>>>Cheers,
>>>>>>Scott
>>>>>>
>>>>>>On Tue, May 25, 2010 at 9:35 AM, mdmiller <mdmiller53@comcast.net> 
>>>>>>wrote:
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>hi all,
>>>>>>>
>>>>>>>sorry i ended up not being able to make the call.
>>>>>>>
>>>>>>>"P value
>>>>>>>The probability (ranging from zero to one) that the results observed 
>>>>>>>in a
>>>>>>>study could have occurred by chance if the null hypothesis was true. 
>>>>>>>A P
>>>>>>>value of ? 0.05 is often used as a threshold to indicate statistical
>>>>>>>significance." (1)
>>>>>>>
>>>>>>>the exact meaning of p-value depends on what is being measured.
>>>>>>>
>>>>>>>also, sometimes it isn't so important that a gene is up or down 
>>>>>>>regulated
>>>>>>>but whether its expression changes from up or down regulated over the
>>>>>>>experimental factors, e.g. if you increase the dose of the drug do 
>>>>>>>the
>>>>>>>target genes go from non-expressed to up regulated.
>>>>>>>
>>>>>>>cheers,
>>>>>>>michael
>>>>>>>
>>>>>>>1)
>>>>>>>
>>>>>>>              
>>>>>>>
>http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=antiepi&part=appendixes.a
>pp2
>  
>
>>>>>>>----- Original Message ----- From: "Kei Cheung" <kei.cheung@yale.edu>
>>>>>>>To: "HCLS" <public-semweb-lifesci@w3.org>
>>>>>>>Sent: Monday, May 24, 2010 11:40 AM
>>>>>>>Subject: Re: BioRDF Telcon
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Today's minutes are available at:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>http://esw.w3.org/HCLSIG_BioRDF_Subgroup/Meetings/2010/05-24_Conference_Call
>  
>
>>>>>>>>Thanks to Matthias for scribing.
>>>>>>>>
>>>>>>>>Cheers,
>>>>>>>>
>>>>>>>>-Kei
>>>>>>>>
>>>>>>>>mdmiller wrote:
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>hi kei,
>>>>>>>>>
>>>>>>>>>look forward to joining the call,
>>>>>>>>>michael
>>>>>>>>>
>>>>>>>>>----- Original Message ----- From: "Kei Cheung" 
>>>>>>>>><kei.cheung@yale.edu>
>>>>>>>>>To: "mdmiller" <mdmiller53@comcast.net>; "HCLS"
>>>>>>>>><public-semweb-lifesci@w3.org>
>>>>>>>>>Sent: Saturday, May 22, 2010 12:10 PM
>>>>>>>>>Subject: Re: BioRDF Telcon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>Hi Michael,
>>>>>>>>>>
>>>>>>>>>>Yes, May 24 was what I meant. It was a typo.
>>>>>>>>>>
>>>>>>>>>>Thanks,
>>>>>>>>>>
>>>>>>>>>>-Kei
>>>>>>>>>>
>>>>>>>>>>mdmiller wrote:
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>>>hi kei,
>>>>>>>>>>>
>>>>>>>>>>>do you mean monday (may 24)?
>>>>>>>>>>>
>>>>>>>>>>>cheers,
>>>>>>>>>>>michael
>>>>>>>>>>>
>>>>>>>>>>>----- Original Message ----- From: "Kei Cheung" 
>>>>>>>>>>><kei.cheung@yale.edu>
>>>>>>>>>>>To: "JunZhao" <jun.zhao@zoo.ox.ac.uk>
>>>>>>>>>>>Cc: <public-semweb-lifesci@w3.org>
>>>>>>>>>>>Sent: Friday, May 21, 2010 2:28 PM
>>>>>>>>>>>Subject: Re: BioRDF Telcon
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>>>Since there were only Jun and Scott who attended the last BioRDF
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>
>  
>
>>>>>>>>>>>>call
>>>>>>>>>>>>(I was not able to attend due to some emergency meetings), we 
>>>>>>>>>>>>decided to
>>>>>>>>>>>>have the next BioRDF call on the coming Monday (May 21) at 11 am
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>
>  
>
>>>>>>>>>>>>(EDT). The
>>>>>>>>>>>>agenda will be the same (see below).
>>>>>>>>>>>>
>>>>>>>>>>>>Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>>-Kei
>>>>>>>>>>>>
>>>>>>>>>>>>JunZhao wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>>>This is a reminder that the next BioRDF telcon call will be 
>>>>>>>>>>>>>held at
>>>>>>>>>>>>>11
>>>>>>>>>>>>>am EDT (4 pm CET) on Monday, May 17 (see details below).
>>>>>>>>>>>>>
>>>>>>>>>>>>>Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>>-Jun
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>== Conference Details ==
>>>>>>>>>>>>>* Date of Call: Monday, May 17, 2010
>>>>>>>>>>>>>* Time of Call: 11:00 am Eastern Time (4 pm CET)
>>>>>>>>>>>>>* Dial-In #: +1.617.761.6200 (Cambridge, MA)
>>>>>>>>>>>>>* Dial-In #: +33.4.89.06.34.99 (Nice, France)
>>>>>>>>>>>>>* Dial-In #: +44.117.370.6152 (Bristol, UK)
>>>>>>>>>>>>>* Participant Access Code: 4257 ("HCLS")
>>>>>>>>>>>>>* IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC 
>>>>>>>>>>>>>page
>>>>>>>>>>>>>for
>>>>>>>>>>>>>details, or see Web IRC), Quick Start: Use
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
>  
>
>>>>>>>>>>>>>for
>>>>>>>>>>>>>IRC access.
>>>>>>>>>>>>>* Duration: ~1 hour
>>>>>>>>>>>>>* Frequency: bi-weekly
>>>>>>>>>>>>>* Convener: Jun
>>>>>>>>>>>>>* Scribe: to-be-determined
>>>>>>>>>>>>>
>>>>>>>>>>>>>==Agenda==
>>>>>>>>>>>>>* Introduction
>>>>>>>>>>>>>* Gene list RDF representation
>>>>>>>>>>>>>* iPhone demo
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>
>>>>>          
>>>>>
>>>>        
>>>>
>>>
>>>      
>>>
>>
>>    
>>
>
>
>
>  
>

Received on Thursday, 27 May 2010 03:52:54 UTC