- From: Michael Miller <Michael.Miller@systemsbiology.org>
- Date: Wed, 30 Nov 2011 08:11:14 -0800
- To: expressionrdf@googlegroups.com
- Cc: Chris Mungall <cjmungall@lbl.gov>, "M. Scott Marshall" <mscottmarshall@gmail.com>, HCLS <public-semweb-lifesci@w3.org>
- Message-ID: <ebf8d79eaeabd7f1d378b31793294523@mail.gmail.com>
hi lena, it looks like it is actually an indication of possibly a bad alignment mapping (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2813481/) "MicroRNAs (miRNAs) are short (20–23 nt) RNAs that are sequence-specific mediators of transcriptional and post-transcriptional regulation of gene expression. Modern high-throughput technologies enable deep sequencing of such RNA species on an unprecedented scale. We find that the analysis of small RNA deep-sequencing libraries can be affected by cross-mapping, in which RNA sequences originating from one locus are inadvertently mapped to another. Similar to cross-hybridization on microarrays, cross-mapping is prevalent among miRNAs, as they tend to occur in families, are similar or derived from repeat or structural RNAs, or are post-transcriptionally modified. Here, we develop a strategy to correct for cross-mapping, and apply it to the analysis of RNA editing in mature miRNAs. In contrast to previous reports, our analysis suggests that RNA editing in mature miRNAs is rare in animals." i'll also try and find other sequencing data to get a broader idea of the quantitation types for different rna-seq technologies. cheers, michael *From:* expressionrdf@googlegroups.com [mailto: expressionrdf@googlegroups.com] *On Behalf Of *Helena Deus *Sent:* Wednesday, November 30, 2011 3:53 AM *To:* expressionrdf@googlegroups.com *Cc:* Chris Mungall; M. Scott Marshall; HCLS *Subject:* Re: [BioRDF] W3C Note on expression RDF This is great, Michael, Thank! I assume the read per million have to do with normalization/QC of the data. Any idea what the "cross map to other miRNA forms means"; by the name, I assume these are miRNA that can target more than one gene. Now the question is: should we blindly create an RDF representation for reporting sequencing results such as "miRNA name", "raw read count", etc? Or, use a more interesting approach, whereby we try to map the miRNA to the genes that they are regulating and use the "cross map" to link each miRNA to othre miRNA forms (in ccase the value is a Y)? This would enable easy linking to the expression values!! Ideas? Suggestions? Best, Lena On Tue, Nov 29, 2011 at 11:43 PM, Michael Miller < Michael.Miller@systemsbiology.org> wrote: hi all, here's what i found as quantitation types from the TCGA MAGE files for next gene seq from the DESCRIPTION.txt file (from C:\data\nci\2011_11_28_tcga\blca\cgcc\bcgsc.ca \illuminahiseq_mirnase\mirnaseq\bcgsc.ca_BLCA.IlluminaHiSeq_miRNASeq.Level_3.1.0.0): The .mirna.quantification.txt data file describing summed expression for each miRNA is as follows: miRNA name raw read count reads per million miRNA reads cross-mapped to other miRNA forms (Y or N) The .isoform.quantification.txt data file describing every individual sequence isoform observed is as follows: miRNA name alignment coordinates as <version>:<Chromosome>:<Start position>-<End position>:<Strand> raw read count reads per million miRNA reads cross-mapped to other miRNA forms (Y or N) region within miRNA as the URL suggests, this is for miRNA but i imagine for IlluminaHiSeq with any sample this would be typical. it's likely to be true that each of the sequencing technologies have specific quantitation types. *From:* Chris Mungall [mailto:cjmungall@lbl.gov] *Sent:* Monday, November 07, 2011 7:35 AM *To:* M. Scott Marshall *Cc:* expressionrdf@googlegroups.com; HCLS *Subject:* Re: [BioRDF] W3C Note on expression RDF On Nov 7, 2011, at 2:03 PM, M. Scott Marshall wrote: Dear BioRDF, I've pasted the minutes from our last meeting below. You can find them here: http://www.w3.org/2011/10/24-HCLS-minutes.html Part of the discussion that isn't available in the minutes below was agreement that NGS expression could be minimally supported by, for example, providing a placeholder for information such as quantified expression. Phil gave us some slides (see link to PDF below) and pointed us to slide 81. Analogous to representing the *results* of differential expression analysis in microarrays (rather than all details of images analysis, etc.), we would like to be able to represent the results of analyzing RNA-seq data. A few of you expressed interest in looking into the minimal features needed to represent NGS RNA-seq analysis results (Michael, Phil, others?). Please also feel free to continue this discussion on the mailing list. What sort of thing do you have in mind for the RNA-seq data? Would this be subsumed by a generic RDF representation for interval based formats like GFF3 and formats like wiggle? What about downstream analyses, e.g. GOseq? I'd be interested in working on a standard format for the results of enrichment analyses. See: http://biostar.stackexchange.com/questions/11269/is-there-a-standard-format-for-go-term-enrichment-results Our current thinking is to define an abstract model independent of serialization, and concrete forms such as json, tab-delimited and rdf. I haven't had time to trim down and restructure the google doc yet. I cannot make a teleconference in today's BioRDF timeslot but encourage you to call in if you want to continue the discussion with others that show up. Cheers, Scott https://docs.google.com/document/d/1A5-3tOsifPWPpETBKU-ZA9d7O7wK_nBzTFUBEe-0Bzo/edit?authkey=CK-y8Y8C http://purl.org/net/biordfmicroarray/demo http://ui.genexpressfusion.googlecode.com/hg/index.html <*Phil*> it seems currently restriction to microarray based gene expression Scott (retroactively scribing): Repeated goal of W3C note - i.e. to give people confidence in *an* RDF representation and approach. Decide when we go to HTML and version control. What's missing? *Scott:* See if we can minimize differences between current representations? *Sudeshna:* Make a new one as the standard? *Michael:* But we already have enough to work with in the current set of representations. *James:* I thought we were simply going to talk about some of the current work and how it can be used and 'cut it loose'. *Tomasz:* Ours was meant to be a 'reference point'. *Michael:* Yes, 'reference point' sounds better than 'cannonical RDF'. *James:* Some news: A student project at EBI is just coming toward the end. Bulk of ArrayExpress in RDF. Not public data yet. *Jim:* After using MGED in my IPAW paper, I converted to OBI. It's the MAGETAB2RDF work. Some issues with Limpopo. *James:* I could send you (Jim) some stuff for you to take a look at. *Michael:* Could you explain what you mean by "there are some limits to the translations"? *James:* Some matching of terms isn't perfect. This is the IPAW paper: http://www.springerlink.com/index/W10740804446172U.pdf http://krauthammerlab.med.yale.edu/~jpm78/ArrayExpress/E-AFMX-1.rdf.ttl http://swbig.googlecode.com <*james*> JM to post info on magetab 2 rdf at arrayexpress once beta is out - credit to Drashtti Vasant and Tony Burdett <*james*> http://code.google.com/p/open-biomed/wiki/GeneExpressionAtlas <*james*> example queries for gxa rdf <*ericP*> i can hear everything, but can't speak up to volunteer <*james*> congrats eric <*JimMcCusker*> BTW, congrats, eric! <*tomasz*> congrats, Eric! maggots have indeed been revalidated in recent years for keep wounds healing faster (they clean it up) *Scott:* Somebody brought up the need to deal with NGS and I agree. But that means more work.. <*JimMcCusker*> I have to drop off for a prov WG call. If you need anything from me towards the paper, let me know. ok, thanks Jim <*sudeshna*> http://cufflinks.cbcb.umd.edu/ <*tomasz*> James: get it out for feedback for community as soon as possible <*tomasz*> I second that! <*Phil*> http://www.bioinformatics.auckland.ac.nz/workshops/NGS-workshop-update.pdf <*Phil*> slide 81 <*tomasz*> thanks, bye <*sudeshna*> bye -- Helena F. Deus Post-Doctoral Researcher at DERI/NUIG http://lenadeus.info/
Received on Wednesday, 30 November 2011 16:11:41 UTC