- From: Kei Cheung <kei.cheung@yale.edu>
- Date: Sun, 13 Dec 2009 21:32:41 -0500
- To: mdmiller <mdmiller53@comcast.net>
- CC: Jim McCusker <james.mccusker@yale.edu>, Helena Deus <helenadeus@gmail.com>, HCLS <public-semweb-lifesci@w3.org>
Hi Michael, Thanks for pointing to MSigDB. I inlucded this in the related links section of the microarray use case description (http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/QueryFederation2). Also please see my response below. Cheers, -Kei mdmiller wrote: > hi all, > > here is he link to Molecular Signatures Database (MSigDB): > > [1]: http://www.broadinstitute.org/gsea/msigdb/ > > cheers, > michael > > ----- Original Message ----- From: "mdmiller" <mdmiller53@comcast.net> > To: "Kei Cheung" <kei.cheung@yale.edu> > Cc: "Jim McCusker" <james.mccusker@yale.edu>; "Helena Deus" > <helenadeus@gmail.com>; "HCLS" <public-semweb-lifesci@w3.org> > Sent: Thursday, December 10, 2009 6:49 AM > Subject: Re: BioRDF Telcon > > >> hi kei, >> >>> To me, ontologies can be used to facilitate integrated semantic >>> queries across experiments/datasets. >> >> >> yes, and this is starting to become a reality. this effort, along >> with other HCLS initiatives are helping to pave the way. >> >>> While some of the protocols are standardized, the data protocols for >>> obtaining things like gene lists vary a lot. One of my questions is >>> that can such data analysis protocols be somehow entered into mage-tab. >> >> >> yes it can be, along with the gene list, but in practice this is not >> done by the submitter. after the Derived Array Data representing the >> normalized data, like CHP files, there can be one or more Protocol >> REF columns describing the analysis to obtain the gene list followed >> by a Derived Array Data Matrix File that is the gene list with its >> signature. >> >> perhaps MIAME needs to be extended to state this. it's something >> i'll be bringing up with the MGED board. it's just now that this has >> become something of value to be machine readable. besides GeneSigDB, >> there is another effort, MSiqDB [1], that is also curating gene >> lists. so the community is beginning to see the value of this. >> Yes, for these gene lists to be of value to researchers, rich annotation is key. The challenge here is that it's quite tedious to enter the custom data analysis protocols in a structured way by hand. >>> At least for now, I don't think we need to convert the huge primary >>> data files (e.g., CEL file) into RDF. For the time being, we are >>> more focused on the processed gene lists that may be associated with >>> more biological meanings. >> >> >> perhaps its worthwhile considering using an ontology 'raw data' class >> for raw data that contains a reference to the data file. one could >> then use appropriate analysis tools to produce normalized data which >> could then also be referenced by a 'normalized data' class. > It seems to make sense. >> >> cheers, >> michael >> >> ----- Original Message ----- From: "Kei Cheung" <kei.cheung@yale.edu> >> To: "mdmiller" <mdmiller53@comcast.net> >> Cc: "Jim McCusker" <james.mccusker@yale.edu>; "Helena Deus" >> <helenadeus@gmail.com>; "HCLS" <public-semweb-lifesci@w3.org> >> Sent: Monday, December 07, 2009 7:32 AM >> Subject: Re: BioRDF Telcon >> >> >>> mdmiller wrote: >>> >>>> hi jim and lena, >>>> >>>> great progress! this will be a nice tool. >>>> >>>> a couple of comments. >>>> >>>> 1) i think ProtocolApplication is based seen as an individual >>>> instance of the Protocol class. quite often there are arguments >>>> whether ontologies should have individuals or be simply classes. >>>> to me, that doesn't apply here where real world objects are being >>>> connected to ontologies. the BioSource is realized as the 'Source >>>> Name' column in MAGE-TAB and those entries represent real people in >>>> studies, mice or rats in non-clinical studies, etc., and the >>>> characteristics values like age represent real individual instances >>>> of age. in the same way, the values in the Protocol REF column of >>>> MAGE-TAB are real wet-lab or analysis individual instances of >>>> protocols, called protocol applications in MAGE-OM. >>> >>> It sounds like we need to look at how to map column names and >>> entries to classes, instances, and relationships appropriately. >>> >>>> >>>> failure to make this distinction, to me, has obscured how much >>>> value ontologies can have in the real world. too often i see >>>> ontologies seen in and of themselves, which has its own value >>>> certainly, but not for the use cases i have dealing with real >>>> biological data. >>> >>> >>> To me, ontologies can be used to facilitate integrated semantic >>> queries across experiments/datasets. >>> >>>> >>>> 2) the usefulness, for this use case, of the information between >>>> the 'Source Name' and its characteristics and the 'Derived Array >>>> Data Matrix File' or 'Derived Array Data File' has limited >>>> usefulness, error correction and normalization can make some >>>> difference but if the provider of the MAGE-TAB is trusted, all that >>>> is pretty routine these days. the above combined with experimental >>>> factors and experiment design info is probably 95% to 99.9% the >>>> worthwhile information from the MAGE-TAB. if one notices a >>>> difference in the final gene set between two experiments that look >>>> the same, only then it might be worthwhile going into more detail. >>>> >>>> and has been noted the MAGE-TAB information needs to be >>>> supplemented with the information on the final gene set, its >>>> expression values, and the higher-level level analysis that was >>>> used, that is buried in the paper usually. >>> >>> While some of the protocols are standardized, the data protocols for >>> obtaining things like gene lists vary a lot. One of my questions is >>> that can such data analysis protocols be somehow entered into mage-tab. >>> >>>> >>>> 3) i'm not sure if there was a desire to capture the raw data in >>>> the RDF. that will be, for affymetrix, a million to six million >>>> probes in the CEL file, even the processed data in the CHP file >>>> would have 20,000 to 60,000 probe sets. i'm not sure if that is >>>> the best way to represent that. >>> >>> At least for now, I don't think we need to convert the huge primary >>> data files (e.g., CEL file) into RDF. For the time being, we are >>> more focused on the processed gene lists that may be associated with >>> more biological meanings. >>> >>> Cheers, >>> >>> -Kei >>> >>>> >>>> cheers, >>>> michael >>>> >>>> Michael Miller >>>> mdmiller53@comcast.net >>>> >>>> ----- Original Message ----- From: "Jim McCusker" >>>> <james.mccusker@yale.edu> >>>> To: "Helena Deus" <helenadeus@gmail.com> >>>> Cc: "Kei Cheung" <kei.cheung@yale.edu>; "mdmiller" >>>> <mdmiller53@comcast.net>; "HCLS" <public-semweb-lifesci@w3.org> >>>> Sent: Monday, November 30, 2009 8:19 AM >>>> Subject: Re: BioRDF Telcon >>>> >>>> >>>> I'm following a similar strategy, but have been folowing the MGED >>>> ontology where possible. I've finished aligning the IDF portion, and >>>> have started on SDRF. MGED ontology is missing a property and class >>>> for what is often termed as ProtocolApplication, which usually serves >>>> as an edge between derived from and derived nodes, while linking to >>>> the protocol used for the derivation. I am planning on creating this >>>> link in a MAGE extensions ontology, but would like to vet the >>>> structure here: >>>> >>>> ProtocolApplication is a class. >>>> >>>> New properties: >>>> >>>> has_derivation_source >>>> has_derivative >>>> >>>> And then ProtocolApplication would have the restrictions: >>>> >>>> has_protocol some Protocol >>>> >>>> I don't put, domains, etc. on the derived properties to allow use in >>>> directly describing derivations if people so choose. There is no >>>> superclass for all nodes that can be derived or derived from, so I'm >>>> not bothering with restrictions for those, although I could add a >>>> union restriction to it. >>>> >>>> If this structure us acceptable to people, I can publish the ontology >>>> for general use pretty quickly, and let us work from the same data >>>> structure. I would appreciate any feedback. >>>> >>>> Jim >>>> >>>> On Monday, November 30, 2009, Helena Deus <helenadeus@gmail.com> >>>> wrote: >>>> >>>>> @Kei, >>>>> >>>>> >>>>> >>>>> When you said data structure, did you mean the RDF structure >>>>> For now, all I have is the java object returned by parser. I've >>>>> been using Limpopo, which creates an object that I can then parse >>>>> to RDF uing Jena. The challenge, though, has been coming up with >>>>> the predicates to formalize the relationships between the various >>>>> elements. I'm using the XML structures fir IDF/SDRF etc. at >>>>> http://magetab-om.sourceforge.net to automatically generate the >>>>> structure that will contain the data. My plan is to then create >>>>> the RDF triples that use the attributes described in those >>>>> documents and populate them with the data from the MAGE-TAB java >>>>> object created by Limpopo. >>>>> >>>>> Right now all I have is a very raw RDF/XML document describing the >>>>> relationships in the IDF structure: >>>>> http://magetab2rdf.googlecode.com/svn/trunk/magetabpredicates.rdf >>>>> The triples for that had to be encoded manually using Jena by >>>>> reading the model. >>>>> @Satya and Jun >>>>> I would very much like to be involved in that effort, do you >>>>> already have a URL that I can look at? >>>>> >>>>> ThanksLena >>>>> On Tue, Nov 24, 2009 at 2:19 PM, Kei Cheung <kei.cheung@yale.edu> >>>>> wrote: >>>>> Hi Lena et al, >>>>> >>>>> When you said data structure, did you mean the RDF structure. If >>>>> so, is a pointer to the structure that we can look at? >>>>> >>>>> As discussed during yesterday's call, Jun and Satya will help >>>>> create a wiki page for listing some of the requirements for >>>>> provenance/workflow in the context of gene lists, perhaps we >>>>> should also use it to help coordinate some of the future >>>>> activities (people also brought up Taverna during the call >>>>> yesterday). Please coordinate with Satya and Jun. >>>>> >>>>> Cheers, >>>>> >>>>> -Kei >>>>> >>>>> Helena Deus wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I apologize for missing the call yesterday! It seems you had a >>>>> pretty interesting discussion! :-) >>>>> If I understand Michael's statement, parsing the MAGE-TAB/MAGE-ML >>>>> into RDF would result in obtaining only the raw and processed data >>>>> files but not the mechanism used to process it nor the resulting >>>>> gene list. That's also what I concluded after looking at the data >>>>> structure created by Tony Burdett's Limpopo parser. However, >>>>> having the raw data as linked data is already a great start! Kei, >>>>> should I be looking into Taverna in order to reprocessed the raw >>>>> files with a traceable analysis workflow? >>>>> >>>>> Thanks! >>>>> Lena >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Nov 24, 2009 at 9:59 AM, mdmiller <mdmiller53@comcast.net >>>>> <mailto:mdmiller53@comcast.net>> wrote: >>>>> >>>>> hi all, >>>>> >>>>> (from the minutes) >>>>> >>>>> "Yolanda/Kei/Scott: semantic annotation/description of workflow >>>>> would enable the retrieval of data relevant to that workflow (i.e. >>>>> data that could be used to populate that workflow for a different >>>>> experimental scenario)" >>>>> >>>>> what is typically in a MAGE-TAB/MAGE-ML document are the protocols >>>>> for how the source was processed into the extract then how the >>>>> hybridization, feature extraction, error and normalization were >>>>> performed. these are interesting and different protocols can >>>>> cause differences at this level but it is pretty much a known art >>>>> and usually not of too much interest or variability. >>>>> >>>>> what is usually missing from those documents, along with the final >>>>> gene list, is how that gene list was obtained, what higher level >>>>> analysis was used, that is generally only in the paper >>>>> unfortunately. >>>>> >>>>> cheers, >>>>> michael >>>>> . >>>>> ----- Original Message ----- From: "Kei Cheung" >>>>> >>>>> <kei.cheung@yale.edu <mailto:kei.cheung@yale.edu>> >>>>> To: "HCLS" <public-semweb-lifesci@w3.org >>>>> >>>>> <mailto:public-semweb-lifesci@w3.org>> >>>>> Sent: Monday, November 23, 2009 1:27 PM >>>>> Subject: Re: BioRDF Telcon >>>>> >>>>> >>>>> >>>>> Today's BioRDF minutes are available at the following: >>>>> >>>>> >>>>> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009/11-23_Conference_Call >>>>> >>>>> >>>>> Thanks to Rob for scribing. >>>>> >>>>> Cheers, >>>>> >>>>> -Kei >>>>> >>>>> Kei Cheung wrote: >>>>> >>>>> This is a reminder that the next BioRDF telcon call will >>>>> be held at 11 am EDT (5 pm CET) on Monday, November 23 >>>>> (see details below). >>>>> >>>>> Cheers, >>>>> >>>>> -Kei >>>>> >>>>> == Conference Details == >>>>> * Date of Call: Monday November 23, 2009 >>>>> * Time of Call: 11:00 am Eastern Time >>>>> * Dial-In #: +1.617.761.6200 (Cambridge, MA) >>>>> * Dial-In #: +33.4.89.06.34.99 (Nice, France) >>>>> * Dial-In #: +44.117.370.6152 (Bristol, UK) >>>>> * Participant Access Code: 4257 ("HCLS") >>>>> >>>>> * IRC Channel: irc.w3.org <http://irc.w3.org> port 6665 >>>>> channel # >>>>> >>>> >>> >>> >>> >> >> >> >> > > >
Received on Monday, 14 December 2009 02:33:24 UTC