RE: BioRDF Telcon

I usually monitor this group and don't contribute but seeing the recent
exchanges about
Gene expression, I feel a need to put things into a better perspective than
the one currently
Being shared

My experience comes from many years overseeing bioinformatics (and gene
expression, proteomics
And clinical data) at Wyeth, Roche, UPenn and with a DOD center at Windber-
There appear to be several issues that are not being realistically addressed
in the current discussion

1. there is significant experimental variability across individual studies,
published or not-
Because of variation in tissue/cell handling/storage/preparation,
experimental variability in
The experiment and significant variability in the data analysis.  i.e.
experimental reproducibility
Inter-lab is poor and even intra-lab can be a major challenge

2. the measurement that is usually referred to as "up or down gene
expression/regulation" refers to 
The comparison between 2 experiments (sample under 2 different conditions)
but typically does not
Adequately correct for individual experimental variability other than
"simple" scaling.  We have shown
That this is inadequate.

3. leaving the interpretation to the author is significantly limited as it
tends to reflect the bias of
The author to "observe/confirm" what they are looking for in many of these
studies- i.e. a biostatistician
Will tell you that these experiments are extremely under-powered to reveal
the true statistically significant
Results they would like to achieve

4. human nature looks to favor the "big differences" as being most
significant- unfortunately nature doesn't
Work this way- many of the largest differences are not functionally relevant
but reflect the fact that biological
Control of these specific genes may not be critical to function and so large
variability can be observed and should
Not be interpreted, all of the time, as being most significant.  In fact, we
have developed analytical methods to
Look at large libraries of gene expression studies and evaluate the overall
stability/variability of individual
Genes (and probes) to establish a significance in difference between states
based on how much variation should be
Expected vs how much is observed, especially in genes that show extremely
small levels of expression overall and which
Would not be considered by typical approaches to data analysis

Sorry to interrupt the exchange but I believe that it is critical, when
considering the development of systems to
Represent, store, exchange, model data, that an understanding of the
specifics and uniqueness of the underlying
Data and analytical approaches must be considered beyond simple statistics.

Michael

Michael N. Liebman, PhD
President/Managing Director
Strategic Medicine, Inc
231 Deepdale Drive
Kennett Square, PA 19348

 (814) 659 5450 mobile

m.liebman@strategicmedicine.com
www.strategicmedicine.com 

-----Original Message-----
From: public-semweb-lifesci-request@w3.org
[mailto:public-semweb-lifesci-request@w3.org] On Behalf Of mdmiller
Sent: Wednesday, May 26, 2010 1:47 PM
To: Kei Cheung
Cc: HCLS
Subject: Re: BioRDF Telcon

hi kei,

> Just want to clarify that what I meant was that it might be beyond the 
> scope of our use case to accurately, comprehensively, and precisely define

> what gene expression really  mean given the degree of complexity involved.

exactly, i believe we can trust the authors of the gene expression papers 
and the journals themselves for this

cheers,
michael

----- Original Message ----- 
From: "Kei Cheung" <kei.cheung@yale.edu>
To: "mdmiller" <mdmiller53@comcast.net>
Cc: "HCLS" <public-semweb-lifesci@w3.org>
Sent: Wednesday, May 26, 2010 7:23 AM
Subject: Re: BioRDF Telcon


> Hi Michael,
>
> mdmiller wrote:
>> hi kei,
>>
>>> What do we mean by differentially expressed genes? One definition is 
>>> that differentially  expressed  genes are genes with significantly 
>>> different expression in two samples/conditions/experimental 
>>> factors/dimensions (e.g., treated vs. untreated, disease vs, normal, 
>>> time point1 vs. time point 2) of microarray experiments.
>>
>> yes, this was my meaning.
>>
>> this is to differentiate between a gene that is always expressed under 
>> normal conditions because it is part of an essential pathway that is 
>> always running, that gene is only interesting if its expression level 
>> changes--similarly for a normally unexpressed gene.
>
> Thanks for confirming. A consensus definition (even it's broad) is 
> important to our gene list representation. There are a variety of methods 
> (e.g., statistical tests) that can be used to identify a list of 
> differentially expressed genes in two different groups. That's Scott's 
> point about the importance of capturing as part of the genelist context 
> what methods have been used for detecting differentially expressed genes. 
> I hope the use case can help convince the community the need/use of a 
> common vocabulary for describing such methods.
>>
>>> How to measure or infer gene expression (e.g., from mRNA) is a whole 
>>> complex question that may be beyond the scope of our use case.
>>
>> yes, which i think was scott's point in his reply.  in fact, for the 
>> BioRDF use case, initially at least, it is probably sufficient that the 
>> authors of the paper state that a gene is part of the significant gene 
>> list.
> Just want to clarify that what I meant was that it might be beyond the 
> scope of our use case to accurately, comprehensively, and precisely define

> what gene expression really  mean given the degree of complexity involved.
>
> Cheers,
>
> -Kei
>>
>> cheers,
>> michael
>>
>>
>> ----- Original Message ----- From: "Kei Cheung" <kei.cheung@yale.edu>
>> To: "mdmiller" <mdmiller53@comcast.net>
>> Cc: "M. Scott Marshall" <marshall@science.uva.nl>; "HCLS" 
>> <public-semweb-lifesci@w3.org>
>> Sent: Tuesday, May 25, 2010 8:52 PM
>> Subject: Re: BioRDF Telcon
>>
>>
>>> Hi Michael et al,
>>>
>>> What do we mean by differentially expressed genes? One definition is 
>>> that differentially  expressed  genes are genes with significantly 
>>> different expression in two samples/conditions/experimental 
>>> factors/dimensions (e.g., treated vs. untreated, disease vs, normal, 
>>> time point1 vs. time point 2) of microarray experiments.
>>>
>>> How to measure or infer gene expression (e.g., from mRNA) is a whole 
>>> complex question that may be beyond the scope of our use case.
>>>
>>> Cheers,
>>>
>>> -Kei
>>>
>>> mdmiller wrote:
>>>
>>>> hi scott,
>>>>
>>>> i think you, jim and lena are doing a great job moving the technical 
>>>> aspect of this work forward.  i'm looking forward to seeing the end 
>>>> results.
>>>>
>>>> cheers,
>>>> michael
>>>>
>>>> ----- Original Message ----- From: "M. Scott Marshall" 
>>>> <marshall@science.uva.nl>
>>>> To: "mdmiller" <mdmiller53@comcast.net>
>>>> Cc: "Kei Cheung" <kei.cheung@yale.edu>; "HCLS" 
>>>> <public-semweb-lifesci@w3.org>
>>>> Sent: Tuesday, May 25, 2010 10:21 AM
>>>> Subject: Re: BioRDF Telcon
>>>>
>>>>
>>>>> Hi Michael,
>>>>>
>>>>> Thanks for the clarification. I also explained those concepts during
>>>>> the BioRDF teleconference but it is difficult for the scribe to
>>>>> capture such details accurately from a phone conversation. Just
>>>>> knowing that a gene has changed (either up or down) already gives us
>>>>> something to work with. Since we started with the microarray use case,
>>>>> we have aimed to focus on the list of differentially expressed genes
>>>>> as our entry point into related molecular information, phenotypes,
>>>>> pathways, diseases, etc.
>>>>>
>>>>> In addition to the gene list and experimental factors, there is some
>>>>> data provenance information that characterizes the origins of the gene
>>>>> list, such as the type of significant analysis or technique that was
>>>>> performed (ANOVA, LIMMA, ..) and p-value cutoff for the list discussed
>>>>> in the associated article(s), software packages used (specific R
>>>>> package from BioConductor, GeneSpring, NextBio, ..). It would be handy
>>>>> if there was a common vocabulary for this type of information (URI's
>>>>> for statistical techniques and software packages). I think that some
>>>>> related resources have been described by myGrid/myExperiment. However,
>>>>> lacking a complete vocabulary, it is still possible to make use of the
>>>>> gene list without such a fine grained description of its provenance.
>>>>>
>>>>> Cheers,
>>>>> Scott
>>>>>
>>>>> On Tue, May 25, 2010 at 9:35 AM, mdmiller <mdmiller53@comcast.net> 
>>>>> wrote:
>>>>>
>>>>>> hi all,
>>>>>>
>>>>>> sorry i ended up not being able to make the call.
>>>>>>
>>>>>> "P value
>>>>>> The probability (ranging from zero to one) that the results observed 
>>>>>> in a
>>>>>> study could have occurred by chance if the null hypothesis was true. 
>>>>>> A P
>>>>>> value of ? 0.05 is often used as a threshold to indicate statistical
>>>>>> significance." (1)
>>>>>>
>>>>>> the exact meaning of p-value depends on what is being measured.
>>>>>>
>>>>>> also, sometimes it isn't so important that a gene is up or down 
>>>>>> regulated
>>>>>> but whether its expression changes from up or down regulated over the
>>>>>> experimental factors, e.g. if you increase the dose of the drug do 
>>>>>> the
>>>>>> target genes go from non-expressed to up regulated.
>>>>>>
>>>>>> cheers,
>>>>>> michael
>>>>>>
>>>>>> 1)
>>>>>>
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=antiepi&part=appendixes.a
pp2
>>>>>>
>>>>>> ----- Original Message ----- From: "Kei Cheung" <kei.cheung@yale.edu>
>>>>>> To: "HCLS" <public-semweb-lifesci@w3.org>
>>>>>> Sent: Monday, May 24, 2010 11:40 AM
>>>>>> Subject: Re: BioRDF Telcon
>>>>>>
>>>>>>
>>>>>>> Today's minutes are available at:
>>>>>>>
>>>>>>>
>>>>>>>
http://esw.w3.org/HCLSIG_BioRDF_Subgroup/Meetings/2010/05-24_Conference_Call
>>>>>>>
>>>>>>> Thanks to Matthias for scribing.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> -Kei
>>>>>>>
>>>>>>> mdmiller wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> hi kei,
>>>>>>>>
>>>>>>>> look forward to joining the call,
>>>>>>>> michael
>>>>>>>>
>>>>>>>> ----- Original Message ----- From: "Kei Cheung" 
>>>>>>>> <kei.cheung@yale.edu>
>>>>>>>> To: "mdmiller" <mdmiller53@comcast.net>; "HCLS"
>>>>>>>> <public-semweb-lifesci@w3.org>
>>>>>>>> Sent: Saturday, May 22, 2010 12:10 PM
>>>>>>>> Subject: Re: BioRDF Telcon
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi Michael,
>>>>>>>>>
>>>>>>>>> Yes, May 24 was what I meant. It was a typo.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -Kei
>>>>>>>>>
>>>>>>>>> mdmiller wrote:
>>>>>>>>>
>>>>>>>>>> hi kei,
>>>>>>>>>>
>>>>>>>>>> do you mean monday (may 24)?
>>>>>>>>>>
>>>>>>>>>> cheers,
>>>>>>>>>> michael
>>>>>>>>>>
>>>>>>>>>> ----- Original Message ----- From: "Kei Cheung" 
>>>>>>>>>> <kei.cheung@yale.edu>
>>>>>>>>>> To: "JunZhao" <jun.zhao@zoo.ox.ac.uk>
>>>>>>>>>> Cc: <public-semweb-lifesci@w3.org>
>>>>>>>>>> Sent: Friday, May 21, 2010 2:28 PM
>>>>>>>>>> Subject: Re: BioRDF Telcon
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Since there were only Jun and Scott who attended the last BioRDF

>>>>>>>>>>> call
>>>>>>>>>>> (I was not able to attend due to some emergency meetings), we 
>>>>>>>>>>> decided to
>>>>>>>>>>> have the next BioRDF call on the coming Monday (May 21) at 11 am

>>>>>>>>>>> (EDT). The
>>>>>>>>>>> agenda will be the same (see below).
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> -Kei
>>>>>>>>>>>
>>>>>>>>>>> JunZhao wrote:
>>>>>>>>>>>
>>>>>>>>>>>> This is a reminder that the next BioRDF telcon call will be 
>>>>>>>>>>>> held at
>>>>>>>>>>>> 11
>>>>>>>>>>>> am EDT (4 pm CET) on Monday, May 17 (see details below).
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>> -Jun
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> == Conference Details ==
>>>>>>>>>>>> * Date of Call: Monday, May 17, 2010
>>>>>>>>>>>> * Time of Call: 11:00 am Eastern Time (4 pm CET)
>>>>>>>>>>>> * Dial-In #: +1.617.761.6200 (Cambridge, MA)
>>>>>>>>>>>> * Dial-In #: +33.4.89.06.34.99 (Nice, France)
>>>>>>>>>>>> * Dial-In #: +44.117.370.6152 (Bristol, UK)
>>>>>>>>>>>> * Participant Access Code: 4257 ("HCLS")
>>>>>>>>>>>> * IRC Channel: irc.w3.org port 6665 channel #HCLS (see W3C IRC 
>>>>>>>>>>>> page
>>>>>>>>>>>> for
>>>>>>>>>>>> details, or see Web IRC), Quick Start: Use
>>>>>>>>>>>>
http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
>>>>>>>>>>>> for
>>>>>>>>>>>> IRC access.
>>>>>>>>>>>> * Duration: ~1 hour
>>>>>>>>>>>> * Frequency: bi-weekly
>>>>>>>>>>>> * Convener: Jun
>>>>>>>>>>>> * Scribe: to-be-determined
>>>>>>>>>>>>
>>>>>>>>>>>> ==Agenda==
>>>>>>>>>>>> * Introduction
>>>>>>>>>>>> * Gene list RDF representation
>>>>>>>>>>>> * iPhone demo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>
> 

Received on Thursday, 27 May 2010 06:52:47 UTC