Use case: RDF, pathways, and expression analysis

Hi everyone,

Thanks to all for a very stimulating meeting last week!  Following up,
here's the use case I described at the meeting.

First, here's some background. I work for Affymetrix, the market leader in
gene expression platforms, and one of the few companies in the genomics
industry to turn a profit consistently ($5.8M last quarter).  Expression
measurement is now a standard practice, but the subsequent data analysis is
notoriously overwhemling.  The management at Affy is well aware that when we
take steps to facilitate this analysis, we sell more chips.  As part of
that, Affy provides a free (after complimentary registration) web resource
called NetAffx (http://www.affymetrix.com/analysis/index.affx) to provide
users with information on the genomic features interrogated by their
expression chips.  NetAffx has become substantial in the two years it's been
up: it has 30,000 registered users, and receives on the order of 200,000
hits per day.   

There's lots of ways in which NetAffx could benefit from a semantic web
framework.  For brevity, I'll just describe one that's easy. 

One of the more popular resources under NetAffx is a Gene Ontology (GO)
browser
(https://www.affymetrix.com/support/technical/manual/go_manual.affx).  GO
uses a standardized, graph-structured vocabulary to describe genes in terms
of their functions, their subcellular locations, and the biological
processes they participate in.  Users of the GO browser upload a list of
probe sets (identifiers mapping expression results to genomic entities), and
are taken to an interactive map of the GO graph, with the nodes of the graph
color-coded according to representation in the probe set list.  Users find
this effective for identifying the major themes in their probe set lists,
and hence the major messages from their expression analysis.  But once they
know what processes are most salient, what would be great would be to give
the users a complementary view of the pathways for those processes.  Such a
resource would get used - heavily - our customers go wild over any
pathway-related data we give them!  And with the combination of the BioPax
data and Isaviz, it's very close to something we could provide very easily!

If you look at the illustration under the GO manual, you might find it
reminiscent of Isaviz.  That's no coincidence: the interactive graphs are
SVGs generated by Graphviz.  So with the right use of Isaviz's style sheets,
and RDF pathway data (such as from BioPax), it would be straightforward to
extend the existing GO browser framework to depict pathways.  We wouldn't
even need much of a parser for the pathway data; the pathway layout would
come for free, and all we'd need to work out would be the associated
hyperlinks.

Having the pathway data under RDF provides some nice additional
opportunities.  For one thing, biological pathways overlap.  With RDF, it
would be easy to combine two or more pathways into a larger network for
visualization - scientifically, that would be really cool!  Down the line,
we'll need to think about some related UI and navigation issues; for
instance, to maintain context, it would be useful to indicate which regions
come from which pathway.  But even a per-pathway visualization would be a
great start, and would be received enthusiastically!
      
Melissa

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Melissa Cline, Ph.D.
Staff Scientist, Affymetrix 
melissa_cline@affymetrix.com
cell: (831) 428-9667

Received on Monday, 3 November 2003 16:30:21 UTC