- From: mdmiller <mdmiller53@comcast.net>
- Date: Sat, 12 Dec 2009 07:47:41 -0800
- To: "Kei Cheung" <kei.cheung@yale.edu>
- Cc: "Jim McCusker" <james.mccusker@yale.edu>, "Helena Deus" <helenadeus@gmail.com>, "HCLS" <public-semweb-lifesci@w3.org>
hi all,
here is he link to Molecular Signatures Database (MSigDB):
[1]: http://www.broadinstitute.org/gsea/msigdb/
cheers,
michael
----- Original Message -----
From: "mdmiller" <mdmiller53@comcast.net>
To: "Kei Cheung" <kei.cheung@yale.edu>
Cc: "Jim McCusker" <james.mccusker@yale.edu>; "Helena Deus"
<helenadeus@gmail.com>; "HCLS" <public-semweb-lifesci@w3.org>
Sent: Thursday, December 10, 2009 6:49 AM
Subject: Re: BioRDF Telcon
> hi kei,
>
>> To me, ontologies can be used to facilitate integrated semantic queries
>> across experiments/datasets.
>
> yes, and this is starting to become a reality. this effort, along with
> other HCLS initiatives are helping to pave the way.
>
>> While some of the protocols are standardized, the data protocols for
>> obtaining things like gene lists vary a lot. One of my questions is that
>> can such data analysis protocols be somehow entered into mage-tab.
>
> yes it can be, along with the gene list, but in practice this is not done
> by the submitter. after the Derived Array Data representing the
> normalized data, like CHP files, there can be one or more Protocol REF
> columns describing the analysis to obtain the gene list followed by a
> Derived Array Data Matrix File that is the gene list with its signature.
>
> perhaps MIAME needs to be extended to state this. it's something i'll be
> bringing up with the MGED board. it's just now that this has become
> something of value to be machine readable. besides GeneSigDB, there is
> another effort, MSiqDB [1], that is also curating gene lists. so the
> community is beginning to see the value of this.
>
>> At least for now, I don't think we need to convert the huge primary data
>> files (e.g., CEL file) into RDF. For the time being, we are more focused
>> on the processed gene lists that may be associated with more biological
>> meanings.
>
> perhaps its worthwhile considering using an ontology 'raw data' class for
> raw data that contains a reference to the data file. one could then use
> appropriate analysis tools to produce normalized data which could then
> also be referenced by a 'normalized data' class.
>
> cheers,
> michael
>
> ----- Original Message -----
> From: "Kei Cheung" <kei.cheung@yale.edu>
> To: "mdmiller" <mdmiller53@comcast.net>
> Cc: "Jim McCusker" <james.mccusker@yale.edu>; "Helena Deus"
> <helenadeus@gmail.com>; "HCLS" <public-semweb-lifesci@w3.org>
> Sent: Monday, December 07, 2009 7:32 AM
> Subject: Re: BioRDF Telcon
>
>
>> mdmiller wrote:
>>> hi jim and lena,
>>>
>>> great progress! this will be a nice tool.
>>>
>>> a couple of comments.
>>>
>>> 1) i think ProtocolApplication is based seen as an individual instance
>>> of the Protocol class. quite often there are arguments whether
>>> ontologies should have individuals or be simply classes. to me, that
>>> doesn't apply here where real world objects are being connected to
>>> ontologies. the BioSource is realized as the 'Source Name' column in
>>> MAGE-TAB and those entries represent real people in studies, mice or
>>> rats in non-clinical studies, etc., and the characteristics values like
>>> age represent real individual instances of age. in the same way, the
>>> values in the Protocol REF column of MAGE-TAB are real wet-lab or
>>> analysis individual instances of protocols, called protocol applications
>>> in MAGE-OM.
>> It sounds like we need to look at how to map column names and entries to
>> classes, instances, and relationships appropriately.
>>>
>>> failure to make this distinction, to me, has obscured how much value
>>> ontologies can have in the real world. too often i see ontologies seen
>>> in and of themselves, which has its own value certainly, but not for the
>>> use cases i have dealing with real biological data.
>>
>> To me, ontologies can be used to facilitate integrated semantic queries
>> across experiments/datasets.
>>>
>>> 2) the usefulness, for this use case, of the information between the
>>> 'Source Name' and its characteristics and the 'Derived Array Data Matrix
>>> File' or 'Derived Array Data File' has limited usefulness, error
>>> correction and normalization can make some difference but if the
>>> provider of the MAGE-TAB is trusted, all that is pretty routine these
>>> days. the above combined with experimental factors and experiment
>>> design info is probably 95% to 99.9% the worthwhile information from the
>>> MAGE-TAB. if one notices a difference in the final gene set between two
>>> experiments that look the same, only then it might be worthwhile going
>>> into more detail.
>>>
>>> and has been noted the MAGE-TAB information needs to be supplemented
>>> with the information on the final gene set, its expression values, and
>>> the higher-level level analysis that was used, that is buried in the
>>> paper usually.
>> While some of the protocols are standardized, the data protocols for
>> obtaining things like gene lists vary a lot. One of my questions is that
>> can such data analysis protocols be somehow entered into mage-tab.
>>>
>>> 3) i'm not sure if there was a desire to capture the raw data in the
>>> RDF. that will be, for affymetrix, a million to six million probes in
>>> the CEL file, even the processed data in the CHP file would have 20,000
>>> to 60,000 probe sets. i'm not sure if that is the best way to represent
>>> that.
>> At least for now, I don't think we need to convert the huge primary data
>> files (e.g., CEL file) into RDF. For the time being, we are more focused
>> on the processed gene lists that may be associated with more biological
>> meanings.
>>
>> Cheers,
>>
>> -Kei
>>>
>>> cheers,
>>> michael
>>>
>>> Michael Miller
>>> mdmiller53@comcast.net
>>>
>>> ----- Original Message ----- From: "Jim McCusker"
>>> <james.mccusker@yale.edu>
>>> To: "Helena Deus" <helenadeus@gmail.com>
>>> Cc: "Kei Cheung" <kei.cheung@yale.edu>; "mdmiller"
>>> <mdmiller53@comcast.net>; "HCLS" <public-semweb-lifesci@w3.org>
>>> Sent: Monday, November 30, 2009 8:19 AM
>>> Subject: Re: BioRDF Telcon
>>>
>>>
>>> I'm following a similar strategy, but have been folowing the MGED
>>> ontology where possible. I've finished aligning the IDF portion, and
>>> have started on SDRF. MGED ontology is missing a property and class
>>> for what is often termed as ProtocolApplication, which usually serves
>>> as an edge between derived from and derived nodes, while linking to
>>> the protocol used for the derivation. I am planning on creating this
>>> link in a MAGE extensions ontology, but would like to vet the
>>> structure here:
>>>
>>> ProtocolApplication is a class.
>>>
>>> New properties:
>>>
>>> has_derivation_source
>>> has_derivative
>>>
>>> And then ProtocolApplication would have the restrictions:
>>>
>>> has_protocol some Protocol
>>>
>>> I don't put, domains, etc. on the derived properties to allow use in
>>> directly describing derivations if people so choose. There is no
>>> superclass for all nodes that can be derived or derived from, so I'm
>>> not bothering with restrictions for those, although I could add a
>>> union restriction to it.
>>>
>>> If this structure us acceptable to people, I can publish the ontology
>>> for general use pretty quickly, and let us work from the same data
>>> structure. I would appreciate any feedback.
>>>
>>> Jim
>>>
>>> On Monday, November 30, 2009, Helena Deus <helenadeus@gmail.com> wrote:
>>>> @Kei,
>>>>
>>>>
>>>>
>>>> When you said data structure, did you mean the RDF structure
>>>> For now, all I have is the java object returned by parser. I've been
>>>> using Limpopo, which creates an object that I can then parse to RDF
>>>> uing Jena. The challenge, though, has been coming up with the
>>>> predicates to formalize the relationships between the various elements.
>>>> I'm using the XML structures fir IDF/SDRF etc. at
>>>> http://magetab-om.sourceforge.net to automatically generate the
>>>> structure that will contain the data. My plan is to then create the RDF
>>>> triples that use the attributes described in those documents and
>>>> populate them with the data from the MAGE-TAB java object created by
>>>> Limpopo.
>>>>
>>>> Right now all I have is a very raw RDF/XML document describing the
>>>> relationships in the IDF structure:
>>>> http://magetab2rdf.googlecode.com/svn/trunk/magetabpredicates.rdf
>>>> The triples for that had to be encoded manually using Jena by reading
>>>> the model.
>>>> @Satya and Jun
>>>> I would very much like to be involved in that effort, do you already
>>>> have a URL that I can look at?
>>>>
>>>> ThanksLena
>>>> On Tue, Nov 24, 2009 at 2:19 PM, Kei Cheung <kei.cheung@yale.edu>
>>>> wrote:
>>>> Hi Lena et al,
>>>>
>>>> When you said data structure, did you mean the RDF structure. If so, is
>>>> a pointer to the structure that we can look at?
>>>>
>>>> As discussed during yesterday's call, Jun and Satya will help create a
>>>> wiki page for listing some of the requirements for provenance/workflow
>>>> in the context of gene lists, perhaps we should also use it to help
>>>> coordinate some of the future activities (people also brought up
>>>> Taverna during the call yesterday). Please coordinate with Satya and
>>>> Jun.
>>>>
>>>> Cheers,
>>>>
>>>> -Kei
>>>>
>>>> Helena Deus wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I apologize for missing the call yesterday! It seems you had a pretty
>>>> interesting discussion! :-)
>>>> If I understand Michael's statement, parsing the MAGE-TAB/MAGE-ML into
>>>> RDF would result in obtaining only the raw and processed data files but
>>>> not the mechanism used to process it nor the resulting gene list.
>>>> That's also what I concluded after looking at the data structure
>>>> created by Tony Burdett's Limpopo parser. However, having the raw data
>>>> as linked data is already a great start! Kei, should I be looking into
>>>> Taverna in order to reprocessed the raw files with a traceable analysis
>>>> workflow?
>>>>
>>>> Thanks!
>>>> Lena
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Nov 24, 2009 at 9:59 AM, mdmiller <mdmiller53@comcast.net
>>>> <mailto:mdmiller53@comcast.net>> wrote:
>>>>
>>>> hi all,
>>>>
>>>> (from the minutes)
>>>>
>>>> "Yolanda/Kei/Scott: semantic annotation/description of workflow
>>>> would enable the retrieval of data relevant to that workflow (i.e.
>>>> data that could be used to populate that workflow for a different
>>>> experimental scenario)"
>>>>
>>>> what is typically in a MAGE-TAB/MAGE-ML document are the protocols
>>>> for how the source was processed into the extract then how the
>>>> hybridization, feature extraction, error and normalization were
>>>> performed. these are interesting and different protocols can
>>>> cause differences at this level but it is pretty much a known art
>>>> and usually not of too much interest or variability.
>>>>
>>>> what is usually missing from those documents, along with the final
>>>> gene list, is how that gene list was obtained, what higher level
>>>> analysis was used, that is generally only in the paper unfortunately.
>>>>
>>>> cheers,
>>>> michael
>>>> .
>>>> ----- Original Message ----- From: "Kei Cheung"
>>>>
>>>> <kei.cheung@yale.edu <mailto:kei.cheung@yale.edu>>
>>>> To: "HCLS" <public-semweb-lifesci@w3.org
>>>>
>>>> <mailto:public-semweb-lifesci@w3.org>>
>>>> Sent: Monday, November 23, 2009 1:27 PM
>>>> Subject: Re: BioRDF Telcon
>>>>
>>>>
>>>>
>>>> Today's BioRDF minutes are available at the following:
>>>>
>>>>
>>>> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009/11-23_Conference_Call
>>>>
>>>> Thanks to Rob for scribing.
>>>>
>>>> Cheers,
>>>>
>>>> -Kei
>>>>
>>>> Kei Cheung wrote:
>>>>
>>>> This is a reminder that the next BioRDF telcon call will
>>>> be held at 11 am EDT (5 pm CET) on Monday, November 23
>>>> (see details below).
>>>>
>>>> Cheers,
>>>>
>>>> -Kei
>>>>
>>>> == Conference Details ==
>>>> * Date of Call: Monday November 23, 2009
>>>> * Time of Call: 11:00 am Eastern Time
>>>> * Dial-In #: +1.617.761.6200 (Cambridge, MA)
>>>> * Dial-In #: +33.4.89.06.34.99 (Nice, France)
>>>> * Dial-In #: +44.117.370.6152 (Bristol, UK)
>>>> * Participant Access Code: 4257 ("HCLS")
>>>>
>>>> * IRC Channel: irc.w3.org <http://irc.w3.org> port 6665
>>>> channel #
>>>>
>>>
>>
>>
>>
>
>
>
>
Received on Saturday, 12 December 2009 15:48:22 UTC