Re: Action Items from call today from Kei Cheung on 2009-10-30 (public-semweb-lifesci@w3.org from October 2009)

From: Kei Cheung <kei.cheung@yale.edu>
Date: Thu, 29 Oct 2009 22:24:38 -0400
To: mdmiller <mdmiller53@comcast.net>
CC: Helen Parkinson <parkinson@ebi.ac.uk>, HCLS <public-semweb-lifesci@w3.org>, Tony Burdett <tburdett@ebi.ac.uk>
Message-ID: <4AEA4E66.1070208@yale.edu>
Hi Michael,

Thanks for sharing your experience with your GWAS project. The thing 
with marker identification and prediction is that it involves a series 
of possibly iteratirve analyses. The resulting gene lists are really the 
products of these analyses. Therefore, it is important to capture the 
context of these gene lists. See you and others at the F2F.

Cheers,

-Kei

dmiller wrote:

> hi kei,
>
> yes, the proposal for specifying the dataset seemed to allow rich 
> annotation, one reason i liked it.
>
> the piece i find missing is this.  not being a biologist or 
> statistician i can't offer much help (my strength is software).
>
> once a large group of individuals have been part of a GWAS (perhaps as 
> part of a clinical trial) and a marker based on a gene expression 
> signature over n genes has been determined, presumably that would be 
> published as the set of n genes and some averaged measure of up 
> regulation or down regulation per gene based on averaging that group 
> of individuals and an outcome such as bad prognosis or good prognosis 
> associated with the marker.  now if a new individual has a gene 
> expression profile, how will the expression of this individual's genes 
> be compared against the marker to determine which group the individual 
> falls into?
>
> (there are other scenarios, such as multiple markers associated with 
> different outcomes but the above seems the simplest case.)
>
> look forward to meeting you and the others at the F2F next week.
>
> cheers,
> michael
>
> Michael Miller
> mdmiller53@comcast.net
>
> ----- Original Message ----- From: "Kei Cheung" <kei.cheung@yale.edu>
> To: "mdmiller" <mdmiller53@comcast.net>
> Cc: "Helen Parkinson" <parkinson@ebi.ac.uk>; "HCLS" 
> <public-semweb-lifesci@w3.org>; "Tony Burdett" <tburdett@ebi.ac.uk>
> Sent: Thursday, October 29, 2009 1:43 PM
> Subject: Re: Action Items from call today
>
>
>> Hi Michael et al,
>>
>> Thanks for pointing to this generic approach of RDF representation of 
>> datasets. A list of differentially expressed genes may be associated 
>> with values such as P-values, fold-change, gene symbols, etc. Also, 
>> it's important to capture metadata/provenance associated with the 
>> gene list. This may include the type of statistical test (e.g., 
>> ANOVA) and the array platform employed (e.g., Affymetrix U133A). This 
>> may be an interesting discussion topic at the F2F meeting.
>>
>> Cheers,
>>
>> -Kei
>>
>> mdmiller wrote:
>>
>>> hi all,
>>>
>>> on a different HCLS thread i saw this proposal from jeni tennison 
>>> for specifying a generic dataset, it might be a useful way to encode 
>>> a list of differentially expressed genes.  it looks like one could 
>>> do this encoding on the fly, so that the data itself at the source 
>>> could be in whatever format is natural.
>>>
>>> http://sw.joanneum.at/scovo/schema.html
>>>
>>> cheers,
>>> michael
>>>
>>> Michael Miller
>>> mdmiller53@comcast.net
>>>
>>> ----- Original Message ----- From: "Kei Cheung" <kei.cheung@yale.edu>
>>> To: "Helen Parkinson" <parkinson@ebi.ac.uk>
>>> Cc: "HCLS" <public-semweb-lifesci@w3.org>; "Tony Burdett" 
>>> <tburdett@ebi.ac.uk>
>>> Sent: Wednesday, October 14, 2009 7:03 PM
>>> Subject: Re: Action Items from call today
>>>
>>>
>>>> Thanks, Helen.
>>>>
>>>> To make it more concrete. I've been thinking about some example 
>>>> queries that I hope can be answered by the RDF data once converted. 
>>>> I wonder if the following example quereis can be answered:
>>>>
>>>> Retrieve a list of differentially expressed genes between different 
>>>> brain regions (e.g., hippocampus and entorhinal cortex) for 
>>>> normally aged human subjects.
>>>>
>>>> Retrieve a list of differentially expressed genes for the same 
>>>> brain region of normal human subjects and AD patients.
>>>>
>>>> Using these lists of genes one can issue (federated) queries to 
>>>> retrieve addtional information about the genes for various types of 
>>>> analyses (e.g., GO term enrichment).
>>>>
>>>> Just a thought.
>>>>
>>>> Cheers,
>>>>
>>>> -Kei
>>>>
>>>>
>>>>
>>>> Helen Parkinson wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> here are my action items from the call today
>>>>>
>>>>> 1. MAGE-TAB->RDF, Lena requested details.
>>>>>
>>>>> Code here: https://sourceforge.net/projects/limpopo/
>>>>>
>>>>> Java Parser for MAGE-TAB developed by EBI, used by several groups. 
>>>>> Contact Tony Burdett tburdett@ebi.ac.uk for details. Tony 
>>>>> estimates for a simple RDF dump a few days work. Lena if you are 
>>>>> interested in working on this java code please contact Tony as 
>>>>> he's already designed with rdf export in mind
>>>>>
>>>>> 2. MAGE-TAB->MAGE-ML  - code from Junmin Liu at UPenn
>>>>>
>>>>> https://sourceforge.net/projects/tab2mage/files/  - see mage2tab
>>>>> Pretty much all public MAGE-ML comes from AE and is available from 
>>>>> Arrayexpress ftp dirs as mage-tab already. Exceptions are 
>>>>> Rosetta's mage-ml importer, and non public data
>>>>>
>>>>>
>>>>> 3. EBI experimental factor ontology (EFO) slides, attached
>>>>> see also www.ebi.ac.uk/efo
>>>>>
>>>>> 4. Noted that an RDF dump of atlas data and triple store access 
>>>>> will be useful, we'll announce when these are available
>>>>>
>>>>> thanks
>>>>>
>>>>> Helen
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
Received on Friday, 30 October 2009 02:25:21 UTC