RE: [BioRDF] W3C Note on expression RDF

From: Michael Miller <Michael.Miller@systemsbiology.org>
Date: Tue, 29 Nov 2011 15:43:44 -0800
Message-ID: <aa6b9014a4b4e8452f15d7268c7ba6e4@mail.gmail.com>
To: Chris Mungall <cjmungall@lbl.gov>, "M. Scott Marshall" <mscottmarshall@gmail.com>
Cc: expressionrdf@googlegroups.com, HCLS <public-semweb-lifesci@w3.org>
hi all,

here's what i found as quantitation types from the TCGA MAGE files for next
gene seq from the DESCRIPTION.txt file (from

The .mirna.quantification.txt  data file describing summed expression for
each miRNA is as follows:

miRNA name

raw read count

reads per million miRNA reads

cross-mapped to other miRNA forms (Y or N)

The .isoform.quantification.txt data file describing every individual
sequence isoform observed is as follows:

miRNA name

alignment coordinates as <version>:<Chromosome>:<Start position>-<End

raw read count

reads per million miRNA reads

cross-mapped to other miRNA forms (Y or N)

region within miRNA

as the URL suggests, this is for miRNA but i imagine for IlluminaHiSeq with
any sample this would be typical.  it's likely to be true that each of the
sequencing technologies have specific quantitation types.

*From:* Chris Mungall [mailto:cjmungall@lbl.gov]
*Sent:* Monday, November 07, 2011 7:35 AM
*To:* M. Scott Marshall
*Cc:* expressionrdf@googlegroups.com; HCLS
*Subject:* Re: [BioRDF] W3C Note on expression RDF

On Nov 7, 2011, at 2:03 PM, M. Scott Marshall wrote:

Dear BioRDF,

I've pasted the minutes from our last meeting below. You can find them
here: http://www.w3.org/2011/10/24-HCLS-minutes.html

Part of the discussion that isn't available in the minutes below was
agreement that NGS expression could be minimally supported by, for example,
providing a placeholder for information such as quantified expression. Phil
gave us some slides (see link to PDF below) and pointed us to slide 81.
Analogous to representing the *results* of differential expression analysis
in microarrays (rather than all details of images analysis, etc.), we would
like to be able to represent the results of analyzing RNA-seq data. A few
of you expressed interest in looking into the minimal features needed to
represent NGS RNA-seq analysis results (Michael, Phil, others?). Please
also feel free to continue this discussion on the mailing list.

What sort of thing do you have in mind for the RNA-seq data? Would this be
subsumed by a generic RDF representation for interval based formats like
GFF3 and formats like wiggle?

What about downstream analyses, e.g. GOseq? I'd be interested in working on
a standard format for the results of enrichment analyses. See:


Our current thinking is to define an abstract model independent of
serialization, and concrete forms such as json, tab-delimited and rdf.

I haven't had time to trim down and restructure the google doc yet. I
cannot make a teleconference in today's BioRDF timeslot but encourage you
to call in if you want to continue the discussion with others that show up.






<*Phil*> it seems currently restriction to microarray based gene expression

Scott (retroactively scribing): Repeated goal of W3C note - i.e. to give
people confidence in *an* RDF representation and approach. Decide when we
go to HTML and version control. What's missing?

*Scott:* See if we can minimize differences between current representations?

*Sudeshna:* Make a new one as the standard?

*Michael:* But we already have enough to work with in the current set of

*James:* I thought we were simply going to talk about some of the current
work and how it can be used and 'cut it loose'.

*Tomasz:* Ours was meant to be a 'reference point'.

*Michael:* Yes, 'reference point' sounds better than 'cannonical RDF'.

*James:* Some news: A student project at EBI is just coming toward the end.
Bulk of ArrayExpress in RDF. Not public data yet.

*Jim:* After using MGED in my IPAW paper, I converted to OBI. It's the
MAGETAB2RDF work. Some issues with Limpopo.

*James:* I could send you (Jim) some stuff for you to take a look at.

*Michael:* Could you explain what you mean by "there are some limits to the

*James:* Some matching of terms isn't perfect.

This is the IPAW paper:



<*james*> JM to post info on magetab 2 rdf at arrayexpress once beta is out
- credit to Drashtti Vasant and Tony Burdett

<*james*> http://code.google.com/p/open-biomed/wiki/GeneExpressionAtlas

<*james*> example queries for gxa rdf

<*ericP*> i can hear everything, but can't speak up to volunteer

<*james*> congrats eric

<*JimMcCusker*> BTW, congrats, eric!

<*tomasz*> congrats, Eric!

maggots have indeed been revalidated in recent years for keep wounds
healing faster (they clean it up)

*Scott:* Somebody brought up the need to deal with NGS and I agree. But
that means more work..

<*JimMcCusker*> I have to drop off for a prov WG call. If you need anything
from me towards the paper, let me know.

ok, thanks Jim

<*sudeshna*> http://cufflinks.cbcb.umd.edu/

<*tomasz*> James: get it out for feedback for community as soon as possible

<*tomasz*> I second that!


<*Phil*> slide 81

<*tomasz*> thanks, bye

<*sudeshna*> bye
