W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > November 2011

RE: [BioRDF] W3C Note on expression RDF

From: Michael Miller <Michael.Miller@systemsbiology.org>
Date: Wed, 30 Nov 2011 08:11:14 -0800
Message-ID: <ebf8d79eaeabd7f1d378b31793294523@mail.gmail.com>
To: expressionrdf@googlegroups.com
Cc: Chris Mungall <cjmungall@lbl.gov>, "M. Scott Marshall" <mscottmarshall@gmail.com>, HCLS <public-semweb-lifesci@w3.org>
hi lena,

it looks like it is actually an indication of possibly a bad alignment
mapping (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2813481/)

"MicroRNAs (miRNAs) are short (2023 nt) RNAs that are sequence-specific
mediators of transcriptional and post-transcriptional regulation of gene
expression. Modern high-throughput technologies enable deep sequencing of
such RNA species on an unprecedented scale. We find that the analysis of
small RNA deep-sequencing libraries can be affected by cross-mapping, in
which RNA sequences originating from one locus are inadvertently mapped to
another. Similar to cross-hybridization on microarrays, cross-mapping is
prevalent among miRNAs, as they tend to occur in families, are similar or
derived from repeat or structural RNAs, or are post-transcriptionally
modified. Here, we develop a strategy to correct for cross-mapping, and
apply it to the analysis of RNA editing in mature miRNAs. In contrast to
previous reports, our analysis suggests that RNA editing in mature miRNAs
is rare in animals."

i'll also try and find other sequencing data to get a broader idea of the
quantitation types for different rna-seq technologies.



*From:* expressionrdf@googlegroups.com [mailto:
expressionrdf@googlegroups.com] *On Behalf Of *Helena Deus
*Sent:* Wednesday, November 30, 2011 3:53 AM
*To:* expressionrdf@googlegroups.com
*Cc:* Chris Mungall; M. Scott Marshall; HCLS
*Subject:* Re: [BioRDF] W3C Note on expression RDF

This is great, Michael, Thank!

I assume the read per million have to do with normalization/QC of the data.
Any idea what the "cross map to other miRNA forms means"; by the name, I
assume these are miRNA that can target more than one gene.

Now the question is: should we blindly create an RDF representation for
reporting sequencing results such as "miRNA name", "raw read count", etc?

Or, use a more interesting approach, whereby we try to map the miRNA to the
genes that they are regulating and use the "cross map" to link each miRNA
to othre miRNA forms (in ccase the value is a Y)?

This would enable easy linking to the expression values!!

Ideas? Suggestions?
Best, Lena

On Tue, Nov 29, 2011 at 11:43 PM, Michael Miller <
Michael.Miller@systemsbiology.org> wrote:

hi all,

here's what i found as quantitation types from the TCGA MAGE files for next
gene seq from the DESCRIPTION.txt file (from

The .mirna.quantification.txt  data file describing summed expression for
each miRNA is as follows:

miRNA name

raw read count

reads per million miRNA reads

cross-mapped to other miRNA forms (Y or N)

The .isoform.quantification.txt data file describing every individual
sequence isoform observed is as follows:

miRNA name

alignment coordinates as <version>:<Chromosome>:<Start position>-<End

raw read count

reads per million miRNA reads

cross-mapped to other miRNA forms (Y or N)

region within miRNA

as the URL suggests, this is for miRNA but i imagine for IlluminaHiSeq with
any sample this would be typical.  it's likely to be true that each of the
sequencing technologies have specific quantitation types.

*From:* Chris Mungall [mailto:cjmungall@lbl.gov]
*Sent:* Monday, November 07, 2011 7:35 AM
*To:* M. Scott Marshall
*Cc:* expressionrdf@googlegroups.com; HCLS

*Subject:* Re: [BioRDF] W3C Note on expression RDF

On Nov 7, 2011, at 2:03 PM, M. Scott Marshall wrote:

Dear BioRDF,

I've pasted the minutes from our last meeting below. You can find them
here: http://www.w3.org/2011/10/24-HCLS-minutes.html

Part of the discussion that isn't available in the minutes below was
agreement that NGS expression could be minimally supported by, for example,
providing a placeholder for information such as quantified expression. Phil
gave us some slides (see link to PDF below) and pointed us to slide 81.
Analogous to representing the *results* of differential expression analysis
in microarrays (rather than all details of images analysis, etc.), we would
like to be able to represent the results of analyzing RNA-seq data. A few
of you expressed interest in looking into the minimal features needed to
represent NGS RNA-seq analysis results (Michael, Phil, others?). Please
also feel free to continue this discussion on the mailing list.

What sort of thing do you have in mind for the RNA-seq data? Would this be
subsumed by a generic RDF representation for interval based formats like
GFF3 and formats like wiggle?

What about downstream analyses, e.g. GOseq? I'd be interested in working on
a standard format for the results of enrichment analyses. See:


Our current thinking is to define an abstract model independent of
serialization, and concrete forms such as json, tab-delimited and rdf.

I haven't had time to trim down and restructure the google doc yet. I
cannot make a teleconference in today's BioRDF timeslot but encourage you
to call in if you want to continue the discussion with others that show up.






<*Phil*> it seems currently restriction to microarray based gene expression

Scott (retroactively scribing): Repeated goal of W3C note - i.e. to give
people confidence in *an* RDF representation and approach. Decide when we
go to HTML and version control. What's missing?

*Scott:* See if we can minimize differences between current representations?

*Sudeshna:* Make a new one as the standard?

*Michael:* But we already have enough to work with in the current set of

*James:* I thought we were simply going to talk about some of the current
work and how it can be used and 'cut it loose'.

*Tomasz:* Ours was meant to be a 'reference point'.

*Michael:* Yes, 'reference point' sounds better than 'cannonical RDF'.

*James:* Some news: A student project at EBI is just coming toward the end.
Bulk of ArrayExpress in RDF. Not public data yet.

*Jim:* After using MGED in my IPAW paper, I converted to OBI. It's the
MAGETAB2RDF work. Some issues with Limpopo.

*James:* I could send you (Jim) some stuff for you to take a look at.

*Michael:* Could you explain what you mean by "there are some limits to the

*James:* Some matching of terms isn't perfect.

This is the IPAW paper:



<*james*> JM to post info on magetab 2 rdf at arrayexpress once beta is out
- credit to Drashtti Vasant and Tony Burdett

<*james*> http://code.google.com/p/open-biomed/wiki/GeneExpressionAtlas

<*james*> example queries for gxa rdf

<*ericP*> i can hear everything, but can't speak up to volunteer

<*james*> congrats eric

<*JimMcCusker*> BTW, congrats, eric!

<*tomasz*> congrats, Eric!

maggots have indeed been revalidated in recent years for keep wounds
healing faster (they clean it up)

*Scott:* Somebody brought up the need to deal with NGS and I agree. But
that means more work..

<*JimMcCusker*> I have to drop off for a prov WG call. If you need anything
from me towards the paper, let me know.

ok, thanks Jim

<*sudeshna*> http://cufflinks.cbcb.umd.edu/

<*tomasz*> James: get it out for feedback for community as soon as possible

<*tomasz*> I second that!


<*Phil*> slide 81

<*tomasz*> thanks, bye

<*sudeshna*> bye

Helena F. Deus

Post-Doctoral Researcher at DERI/NUIG

Received on Wednesday, 30 November 2011 16:11:41 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:52:49 UTC