Re: [BioRDF] W3C Note on expression RDF from Sudeshna Das on 2011-11-30 (public-semweb-lifesci@w3.org from December 2011)

From: Sudeshna Das <sdas@seas.harvard.edu>
Date: Wed, 30 Nov 2011 16:13:14 -0500
To: <expressionrdf@googlegroups.com>
CC: Chris Mungall <cjmungall@lbl.gov>, "M. Scott Marshall" <mscottmarshall@gmail.com>, HCLS <public-semweb-lifesci@w3.org>
Message-ID: <6F299A7A-F9E4-473A-8D9A-FDAC4711FF5E@seas.harvard.edu>
Hi Lena,

To capture the results of RNA-Seq here is what we captured for our site:

1. Accession - this could be the miRNA accession or Transcript accession or any other identifier for novel transcripts.
2. The genomic interval (start, end and genome version).
3. Fragments per KB per million mapped  (FPKM) values - these will be identical to RPKM (Reads per KB per million mapped reads) for single end sequencing.
The raw read count  is not normalized to the depth of sequencing and hence cannot be compared across studies.

Then using the genomic interval one can find the nearest gene and look at the expression of genes that the miRNA may be influencing.

Best,
Sudeshna


On Nov 30, 2011, at 6:52 AM, Helena Deus wrote:

> This is great, Michael, Thank!
> 
> I assume the read per million have to do with normalization/QC of the data. Any idea what the "cross map to other miRNA forms means"; by the name, I assume these are miRNA that can target more than one gene.
> 
> Now the question is: should we blindly create an RDF representation for reporting sequencing results such as "miRNA name", "raw read count", etc?
> Or, use a more interesting approach, whereby we try to map the miRNA to the genes that they are regulating and use the "cross map" to link each miRNA to othre miRNA forms (in ccase the value is a Y)? 
> 
> This would enable easy linking to the expression values!! 
> 
> Ideas? Suggestions?
> Best, Lena
> 
> 
> On Tue, Nov 29, 2011 at 11:43 PM, Michael Miller <Michael.Miller@systemsbiology.org> wrote:
> hi all,
> 
>  
> here's what i found as quantitation types from the TCGA MAGE files for next gene seq from the DESCRIPTION.txt file (from C:\data\nci\2011_11_28_tcga\blca\cgcc\bcgsc.ca\illuminahiseq_mirnase\mirnaseq\bcgsc.ca_BLCA.IlluminaHiSeq_miRNASeq.Level_3.1.0.0):
> 
> 
> The .mirna.quantification.txt  data file describing summed expression for each miRNA is as follows:
> 
>  
> miRNA name
> 
> raw read count
> 
> reads per million miRNA reads
> 
> cross-mapped to other miRNA forms (Y or N)
> 
>  
> The .isoform.quantification.txt data file describing every individual sequence isoform observed is as follows:
> 
>  
> miRNA name
> 
> alignment coordinates as <version>:<Chromosome>:<Start position>-<End position>:<Strand>
> 
> raw read count
> 
> reads per million miRNA reads
> 
> cross-mapped to other miRNA forms (Y or N)
> 
> region within miRNA
> 
>  
> as the URL suggests, this is for miRNA but i imagine for IlluminaHiSeq with any sample this would be typical.  it's likely to be true that each of the sequencing technologies have specific quantitation types.
> 
>  
>  
>  
> From: Chris Mungall [mailto:cjmungall@lbl.gov] 
> Sent: Monday, November 07, 2011 7:35 AM
> To: M. Scott Marshall
> Cc: expressionrdf@googlegroups.com; HCLS
> 
> 
> Subject: Re: [BioRDF] W3C Note on expression RDF
> 
>  
>  
> On Nov 7, 2011, at 2:03 PM, M. Scott Marshall wrote:
> 
> 
> 
> 
> Dear BioRDF,
> 
> I've pasted the minutes from our last meeting below. You can find them here: http://www.w3.org/2011/10/24-HCLS-minutes.html
> 
> Part of the discussion that isn't available in the minutes below was agreement that NGS expression could be minimally supported by, for example, providing a placeholder for information such as quantified expression. Phil gave us some slides (see link to PDF below) and pointed us to slide 81. Analogous to representing the *results* of differential expression analysis in microarrays (rather than all details of images analysis, etc.), we would like to be able to represent the results of analyzing RNA-seq data. A few of you expressed interest in looking into the minimal features needed to represent NGS RNA-seq analysis results (Michael, Phil, others?). Please also feel free to continue this discussion on the mailing list. 
> 
>  
> What sort of thing do you have in mind for the RNA-seq data? Would this be subsumed by a generic RDF representation for interval based formats like GFF3 and formats like wiggle?
> 
>  
> What about downstream analyses, e.g. GOseq? I'd be interested in working on a standard format for the results of enrichment analyses. See:
> 
>  
>           http://biostar.stackexchange.com/questions/11269/is-there-a-standard-format-for-go-term-enrichment-results
> 
>  
> Our current thinking is to define an abstract model independent of serialization, and concrete forms such as json, tab-delimited and rdf.
> 
> 
> 
> 
> I haven't had time to trim down and restructure the google doc yet. I cannot make a teleconference in today's BioRDF timeslot but encourage you to call in if you want to continue the discussion with others that show up. 
> 
> Cheers,
> 
> Scott
> 
> https://docs.google.com/document/d/1A5-3tOsifPWPpETBKU-ZA9d7O7wK_nBzTFUBEe-0Bzo/edit?authkey=CK-y8Y8C
> 
> http://purl.org/net/biordfmicroarray/demo
> 
> http://ui.genexpressfusion.googlecode.com/hg/index.html
> 
> <Phil> it seems currently restriction to microarray based gene expression
> 
> Scott (retroactively scribing): Repeated goal of W3C note - i.e. to give people confidence in *an* RDF representation and approach. Decide when we go to HTML and version control. What's missing?
> 
> Scott: See if we can minimize differences between current representations?
> 
> Sudeshna: Make a new one as the standard?
> 
> Michael: But we already have enough to work with in the current set of representations.
> 
> James: I thought we were simply going to talk about some of the current work and how it can be used and 'cut it loose'.
> 
> Tomasz: Ours was meant to be a 'reference point'.
> 
> Michael: Yes, 'reference point' sounds better than 'cannonical RDF'.
> 
> James: Some news: A student project at EBI is just coming toward the end. Bulk of ArrayExpress in RDF. Not public data yet.
> 
> Jim: After using MGED in my IPAW paper, I converted to OBI. It's the MAGETAB2RDF work. Some issues with Limpopo.
> 
> James: I could send you (Jim) some stuff for you to take a look at.
> 
> Michael: Could you explain what you mean by "there are some limits to the translations"?
> 
> James: Some matching of terms isn't perfect.
> 
> This is the IPAW paper: http://www.springerlink.com/index/W10740804446172U.pdf
> 
> http://krauthammerlab.med.yale.edu/~jpm78/ArrayExpress/E-AFMX-1.rdf.ttl
> 
> http://swbig.googlecode.com
> 
> <james> JM to post info on magetab 2 rdf at arrayexpress once beta is out - credit to Drashtti Vasant and Tony Burdett
> 
> <james> http://code.google.com/p/open-biomed/wiki/GeneExpressionAtlas
> 
> <james> example queries for gxa rdf
> 
> <ericP> i can hear everything, but can't speak up to volunteer
> 
> <james> congrats eric
> 
> <JimMcCusker> BTW, congrats, eric!
> 
> <tomasz> congrats, Eric!
> 
> maggots have indeed been revalidated in recent years for keep wounds healing faster (they clean it up)
> 
> Scott: Somebody brought up the need to deal with NGS and I agree. But that means more work..
> 
> <JimMcCusker> I have to drop off for a prov WG call. If you need anything from me towards the paper, let me know.
> 
> ok, thanks Jim
> 
> <sudeshna> http://cufflinks.cbcb.umd.edu/
> 
> <tomasz> James: get it out for feedback for community as soon as possible
> 
> <tomasz> I second that!
> 
> <Phil> http://www.bioinformatics.auckland.ac.nz/workshops/NGS-workshop-update.pdf
> 
> <Phil> slide 81
> 
> <tomasz> thanks, bye
> 
> <sudeshna> bye
> 
>  
>  
> 
> 
> 
> -- 
> Helena F. Deus
> Post-Doctoral Researcher at DERI/NUIG
> http://lenadeus.info/
>
Received on Thursday, 1 December 2011 09:34:11 UTC