- From: mdmiller <mdmiller53@comcast.net>
- Date: Sat, 12 Dec 2009 07:47:41 -0800
- To: "Kei Cheung" <kei.cheung@yale.edu>
- Cc: "Jim McCusker" <james.mccusker@yale.edu>, "Helena Deus" <helenadeus@gmail.com>, "HCLS" <public-semweb-lifesci@w3.org>
hi all, here is he link to Molecular Signatures Database (MSigDB): [1]: http://www.broadinstitute.org/gsea/msigdb/ cheers, michael ----- Original Message ----- From: "mdmiller" <mdmiller53@comcast.net> To: "Kei Cheung" <kei.cheung@yale.edu> Cc: "Jim McCusker" <james.mccusker@yale.edu>; "Helena Deus" <helenadeus@gmail.com>; "HCLS" <public-semweb-lifesci@w3.org> Sent: Thursday, December 10, 2009 6:49 AM Subject: Re: BioRDF Telcon > hi kei, > >> To me, ontologies can be used to facilitate integrated semantic queries >> across experiments/datasets. > > yes, and this is starting to become a reality. this effort, along with > other HCLS initiatives are helping to pave the way. > >> While some of the protocols are standardized, the data protocols for >> obtaining things like gene lists vary a lot. One of my questions is that >> can such data analysis protocols be somehow entered into mage-tab. > > yes it can be, along with the gene list, but in practice this is not done > by the submitter. after the Derived Array Data representing the > normalized data, like CHP files, there can be one or more Protocol REF > columns describing the analysis to obtain the gene list followed by a > Derived Array Data Matrix File that is the gene list with its signature. > > perhaps MIAME needs to be extended to state this. it's something i'll be > bringing up with the MGED board. it's just now that this has become > something of value to be machine readable. besides GeneSigDB, there is > another effort, MSiqDB [1], that is also curating gene lists. so the > community is beginning to see the value of this. > >> At least for now, I don't think we need to convert the huge primary data >> files (e.g., CEL file) into RDF. For the time being, we are more focused >> on the processed gene lists that may be associated with more biological >> meanings. > > perhaps its worthwhile considering using an ontology 'raw data' class for > raw data that contains a reference to the data file. one could then use > appropriate analysis tools to produce normalized data which could then > also be referenced by a 'normalized data' class. > > cheers, > michael > > ----- Original Message ----- > From: "Kei Cheung" <kei.cheung@yale.edu> > To: "mdmiller" <mdmiller53@comcast.net> > Cc: "Jim McCusker" <james.mccusker@yale.edu>; "Helena Deus" > <helenadeus@gmail.com>; "HCLS" <public-semweb-lifesci@w3.org> > Sent: Monday, December 07, 2009 7:32 AM > Subject: Re: BioRDF Telcon > > >> mdmiller wrote: >>> hi jim and lena, >>> >>> great progress! this will be a nice tool. >>> >>> a couple of comments. >>> >>> 1) i think ProtocolApplication is based seen as an individual instance >>> of the Protocol class. quite often there are arguments whether >>> ontologies should have individuals or be simply classes. to me, that >>> doesn't apply here where real world objects are being connected to >>> ontologies. the BioSource is realized as the 'Source Name' column in >>> MAGE-TAB and those entries represent real people in studies, mice or >>> rats in non-clinical studies, etc., and the characteristics values like >>> age represent real individual instances of age. in the same way, the >>> values in the Protocol REF column of MAGE-TAB are real wet-lab or >>> analysis individual instances of protocols, called protocol applications >>> in MAGE-OM. >> It sounds like we need to look at how to map column names and entries to >> classes, instances, and relationships appropriately. >>> >>> failure to make this distinction, to me, has obscured how much value >>> ontologies can have in the real world. too often i see ontologies seen >>> in and of themselves, which has its own value certainly, but not for the >>> use cases i have dealing with real biological data. >> >> To me, ontologies can be used to facilitate integrated semantic queries >> across experiments/datasets. >>> >>> 2) the usefulness, for this use case, of the information between the >>> 'Source Name' and its characteristics and the 'Derived Array Data Matrix >>> File' or 'Derived Array Data File' has limited usefulness, error >>> correction and normalization can make some difference but if the >>> provider of the MAGE-TAB is trusted, all that is pretty routine these >>> days. the above combined with experimental factors and experiment >>> design info is probably 95% to 99.9% the worthwhile information from the >>> MAGE-TAB. if one notices a difference in the final gene set between two >>> experiments that look the same, only then it might be worthwhile going >>> into more detail. >>> >>> and has been noted the MAGE-TAB information needs to be supplemented >>> with the information on the final gene set, its expression values, and >>> the higher-level level analysis that was used, that is buried in the >>> paper usually. >> While some of the protocols are standardized, the data protocols for >> obtaining things like gene lists vary a lot. One of my questions is that >> can such data analysis protocols be somehow entered into mage-tab. >>> >>> 3) i'm not sure if there was a desire to capture the raw data in the >>> RDF. that will be, for affymetrix, a million to six million probes in >>> the CEL file, even the processed data in the CHP file would have 20,000 >>> to 60,000 probe sets. i'm not sure if that is the best way to represent >>> that. >> At least for now, I don't think we need to convert the huge primary data >> files (e.g., CEL file) into RDF. For the time being, we are more focused >> on the processed gene lists that may be associated with more biological >> meanings. >> >> Cheers, >> >> -Kei >>> >>> cheers, >>> michael >>> >>> Michael Miller >>> mdmiller53@comcast.net >>> >>> ----- Original Message ----- From: "Jim McCusker" >>> <james.mccusker@yale.edu> >>> To: "Helena Deus" <helenadeus@gmail.com> >>> Cc: "Kei Cheung" <kei.cheung@yale.edu>; "mdmiller" >>> <mdmiller53@comcast.net>; "HCLS" <public-semweb-lifesci@w3.org> >>> Sent: Monday, November 30, 2009 8:19 AM >>> Subject: Re: BioRDF Telcon >>> >>> >>> I'm following a similar strategy, but have been folowing the MGED >>> ontology where possible. I've finished aligning the IDF portion, and >>> have started on SDRF. MGED ontology is missing a property and class >>> for what is often termed as ProtocolApplication, which usually serves >>> as an edge between derived from and derived nodes, while linking to >>> the protocol used for the derivation. I am planning on creating this >>> link in a MAGE extensions ontology, but would like to vet the >>> structure here: >>> >>> ProtocolApplication is a class. >>> >>> New properties: >>> >>> has_derivation_source >>> has_derivative >>> >>> And then ProtocolApplication would have the restrictions: >>> >>> has_protocol some Protocol >>> >>> I don't put, domains, etc. on the derived properties to allow use in >>> directly describing derivations if people so choose. There is no >>> superclass for all nodes that can be derived or derived from, so I'm >>> not bothering with restrictions for those, although I could add a >>> union restriction to it. >>> >>> If this structure us acceptable to people, I can publish the ontology >>> for general use pretty quickly, and let us work from the same data >>> structure. I would appreciate any feedback. >>> >>> Jim >>> >>> On Monday, November 30, 2009, Helena Deus <helenadeus@gmail.com> wrote: >>>> @Kei, >>>> >>>> >>>> >>>> When you said data structure, did you mean the RDF structure >>>> For now, all I have is the java object returned by parser. I've been >>>> using Limpopo, which creates an object that I can then parse to RDF >>>> uing Jena. The challenge, though, has been coming up with the >>>> predicates to formalize the relationships between the various elements. >>>> I'm using the XML structures fir IDF/SDRF etc. at >>>> http://magetab-om.sourceforge.net to automatically generate the >>>> structure that will contain the data. My plan is to then create the RDF >>>> triples that use the attributes described in those documents and >>>> populate them with the data from the MAGE-TAB java object created by >>>> Limpopo. >>>> >>>> Right now all I have is a very raw RDF/XML document describing the >>>> relationships in the IDF structure: >>>> http://magetab2rdf.googlecode.com/svn/trunk/magetabpredicates.rdf >>>> The triples for that had to be encoded manually using Jena by reading >>>> the model. >>>> @Satya and Jun >>>> I would very much like to be involved in that effort, do you already >>>> have a URL that I can look at? >>>> >>>> ThanksLena >>>> On Tue, Nov 24, 2009 at 2:19 PM, Kei Cheung <kei.cheung@yale.edu> >>>> wrote: >>>> Hi Lena et al, >>>> >>>> When you said data structure, did you mean the RDF structure. If so, is >>>> a pointer to the structure that we can look at? >>>> >>>> As discussed during yesterday's call, Jun and Satya will help create a >>>> wiki page for listing some of the requirements for provenance/workflow >>>> in the context of gene lists, perhaps we should also use it to help >>>> coordinate some of the future activities (people also brought up >>>> Taverna during the call yesterday). Please coordinate with Satya and >>>> Jun. >>>> >>>> Cheers, >>>> >>>> -Kei >>>> >>>> Helena Deus wrote: >>>> >>>> Hi all, >>>> >>>> I apologize for missing the call yesterday! It seems you had a pretty >>>> interesting discussion! :-) >>>> If I understand Michael's statement, parsing the MAGE-TAB/MAGE-ML into >>>> RDF would result in obtaining only the raw and processed data files but >>>> not the mechanism used to process it nor the resulting gene list. >>>> That's also what I concluded after looking at the data structure >>>> created by Tony Burdett's Limpopo parser. However, having the raw data >>>> as linked data is already a great start! Kei, should I be looking into >>>> Taverna in order to reprocessed the raw files with a traceable analysis >>>> workflow? >>>> >>>> Thanks! >>>> Lena >>>> >>>> >>>> >>>> >>>> On Tue, Nov 24, 2009 at 9:59 AM, mdmiller <mdmiller53@comcast.net >>>> <mailto:mdmiller53@comcast.net>> wrote: >>>> >>>> hi all, >>>> >>>> (from the minutes) >>>> >>>> "Yolanda/Kei/Scott: semantic annotation/description of workflow >>>> would enable the retrieval of data relevant to that workflow (i.e. >>>> data that could be used to populate that workflow for a different >>>> experimental scenario)" >>>> >>>> what is typically in a MAGE-TAB/MAGE-ML document are the protocols >>>> for how the source was processed into the extract then how the >>>> hybridization, feature extraction, error and normalization were >>>> performed. these are interesting and different protocols can >>>> cause differences at this level but it is pretty much a known art >>>> and usually not of too much interest or variability. >>>> >>>> what is usually missing from those documents, along with the final >>>> gene list, is how that gene list was obtained, what higher level >>>> analysis was used, that is generally only in the paper unfortunately. >>>> >>>> cheers, >>>> michael >>>> . >>>> ----- Original Message ----- From: "Kei Cheung" >>>> >>>> <kei.cheung@yale.edu <mailto:kei.cheung@yale.edu>> >>>> To: "HCLS" <public-semweb-lifesci@w3.org >>>> >>>> <mailto:public-semweb-lifesci@w3.org>> >>>> Sent: Monday, November 23, 2009 1:27 PM >>>> Subject: Re: BioRDF Telcon >>>> >>>> >>>> >>>> Today's BioRDF minutes are available at the following: >>>> >>>> >>>> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009/11-23_Conference_Call >>>> >>>> Thanks to Rob for scribing. >>>> >>>> Cheers, >>>> >>>> -Kei >>>> >>>> Kei Cheung wrote: >>>> >>>> This is a reminder that the next BioRDF telcon call will >>>> be held at 11 am EDT (5 pm CET) on Monday, November 23 >>>> (see details below). >>>> >>>> Cheers, >>>> >>>> -Kei >>>> >>>> == Conference Details == >>>> * Date of Call: Monday November 23, 2009 >>>> * Time of Call: 11:00 am Eastern Time >>>> * Dial-In #: +1.617.761.6200 (Cambridge, MA) >>>> * Dial-In #: +33.4.89.06.34.99 (Nice, France) >>>> * Dial-In #: +44.117.370.6152 (Bristol, UK) >>>> * Participant Access Code: 4257 ("HCLS") >>>> >>>> * IRC Channel: irc.w3.org <http://irc.w3.org> port 6665 >>>> channel # >>>> >>> >> >> >> > > > >
Received on Saturday, 12 December 2009 15:48:22 UTC