RE: [BioRDF] W3C Note on expression RDF from Michael Miller on 2011-11-29 (public-semweb-lifesci@w3.org from November 2011)

From: Michael Miller <Michael.Miller@systemsbiology.org>
Date: Tue, 29 Nov 2011 15:43:44 -0800
To: Chris Mungall <cjmungall@lbl.gov>, "M. Scott Marshall" <mscottmarshall@gmail.com>
Cc: expressionrdf@googlegroups.com, HCLS <public-semweb-lifesci@w3.org>
Message-ID: <aa6b9014a4b4e8452f15d7268c7ba6e4@mail.gmail.com>

hi all,

here's what i found as quantitation types from the TCGA MAGE files for next
gene seq from the DESCRIPTION.txt file (from
C:\data\nci\2011_11_28_tcga\blca\cgcc\bcgsc.ca
\illuminahiseq_mirnase\mirnaseq\bcgsc.ca_BLCA.IlluminaHiSeq_miRNASeq.Level_3.1.0.0):

The .mirna.quantification.txt data file describing summed expression for
each miRNA is as follows:

miRNA name

raw read count

reads per million miRNA reads

cross-mapped to other miRNA forms (Y or N)

The .isoform.quantification.txt data file describing every individual
sequence isoform observed is as follows:

miRNA name

alignment coordinates as <version>:<Chromosome>:<Start position>-<End
position>:<Strand>

raw read count

reads per million miRNA reads

cross-mapped to other miRNA forms (Y or N)

region within miRNA

as the URL suggests, this is for miRNA but i imagine for IlluminaHiSeq with
any sample this would be typical. it's likely to be true that each of the
sequencing technologies have specific quantitation types.

*From:* Chris Mungall [mailto:cjmungall@lbl.gov]
*Sent:* Monday, November 07, 2011 7:35 AM
*To:* M. Scott Marshall
*Cc:* expressionrdf@googlegroups.com; HCLS
*Subject:* Re: [BioRDF] W3C Note on expression RDF

On Nov 7, 2011, at 2:03 PM, M. Scott Marshall wrote:

Dear BioRDF,

I've pasted the minutes from our last meeting below. You can find them
here: http://www.w3.org/2011/10/24-HCLS-minutes.html

Part of the discussion that isn't available in the minutes below was
agreement that NGS expression could be minimally supported by, for example,
providing a placeholder for information such as quantified expression. Phil
gave us some slides (see link to PDF below) and pointed us to slide 81.
Analogous to representing the *results* of differential expression analysis
in microarrays (rather than all details of images analysis, etc.), we would
like to be able to represent the results of analyzing RNA-seq data. A few
of you expressed interest in looking into the minimal features needed to
represent NGS RNA-seq analysis results (Michael, Phil, others?). Please
also feel free to continue this discussion on the mailing list.

What sort of thing do you have in mind for the RNA-seq data? Would this be
subsumed by a generic RDF representation for interval based formats like
GFF3 and formats like wiggle?

What about downstream analyses, e.g. GOseq? I'd be interested in working on
a standard format for the results of enrichment analyses. See:

http://biostar.stackexchange.com/questions/11269/is-there-a-standard-format-for-go-term-enrichment-results

Our current thinking is to define an abstract model independent of
serialization, and concrete forms such as json, tab-delimited and rdf.

I haven't had time to trim down and restructure the google doc yet. I
cannot make a teleconference in today's BioRDF timeslot but encourage you
to call in if you want to continue the discussion with others that show up.

Cheers,

Scott

https://docs.google.com/document/d/1A5-3tOsifPWPpETBKU-ZA9d7O7wK_nBzTFUBEe-0Bzo/edit?authkey=CK-y8Y8C

http://purl.org/net/biordfmicroarray/demo

http://ui.genexpressfusion.googlecode.com/hg/index.html

<*Phil*> it seems currently restriction to microarray based gene expression

Scott (retroactively scribing): Repeated goal of W3C note - i.e. to give
people confidence in *an* RDF representation and approach. Decide when we
go to HTML and version control. What's missing?

*Scott:* See if we can minimize differences between current representations?

*Sudeshna:* Make a new one as the standard?

*Michael:* But we already have enough to work with in the current set of
representations.

*James:* I thought we were simply going to talk about some of the current
work and how it can be used and 'cut it loose'.

*Tomasz:* Ours was meant to be a 'reference point'.

*Michael:* Yes, 'reference point' sounds better than 'cannonical RDF'.

*James:* Some news: A student project at EBI is just coming toward the end.
Bulk of ArrayExpress in RDF. Not public data yet.

*Jim:* After using MGED in my IPAW paper, I converted to OBI. It's the
MAGETAB2RDF work. Some issues with Limpopo.

*James:* I could send you (Jim) some stuff for you to take a look at.

*Michael:* Could you explain what you mean by "there are some limits to the
translations"?

*James:* Some matching of terms isn't perfect.

This is the IPAW paper:
http://www.springerlink.com/index/W10740804446172U.pdf

http://krauthammerlab.med.yale.edu/~jpm78/ArrayExpress/E-AFMX-1.rdf.ttl

http://swbig.googlecode.com

<*james*> JM to post info on magetab 2 rdf at arrayexpress once beta is out
- credit to Drashtti Vasant and Tony Burdett

<*james*> http://code.google.com/p/open-biomed/wiki/GeneExpressionAtlas

<*james*> example queries for gxa rdf

<*ericP*> i can hear everything, but can't speak up to volunteer

<*james*> congrats eric

<*JimMcCusker*> BTW, congrats, eric!

<*tomasz*> congrats, Eric!

maggots have indeed been revalidated in recent years for keep wounds
healing faster (they clean it up)

*Scott:* Somebody brought up the need to deal with NGS and I agree. But
that means more work..

<*JimMcCusker*> I have to drop off for a prov WG call. If you need anything
from me towards the paper, let me know.

ok, thanks Jim

<*sudeshna*> http://cufflinks.cbcb.umd.edu/

<*tomasz*> James: get it out for feedback for community as soon as possible

<*tomasz*> I second that!

<*Phil*>
http://www.bioinformatics.auckland.ac.nz/workshops/NGS-workshop-update.pdf

<*Phil*> slide 81

<*tomasz*> thanks, bye

<*sudeshna*> bye

Received on Tuesday, 29 November 2011 23:44:26 UTC