[ANN] News from the FlyWeb Project

Dear all,

This is a summary of work so far by the FlyWeb Project team. We're
exploring integration of life science data in support of Drosophila
(fruit fly) functional genomics. We'd like to develop credible, robust
and genuinely useful tools for the Drosophila research community; and
to provide data and services of value to bioinformaticians and
Semantic Web / Life Science developers.

This is the first time we've announced our work more widely, and we'd
very much appreciate thoughts, suggestions, feedback, re-use and
testing of the applications, services, software and data described
below. Please note however that this is work in progress, and things
may break, change, move or disappear without notice.


= Search Applications =

http://openflydata.org/search/insitus

This application allows you to search for images of in situ RNA
hybridisation experiments, depicting expression of specific genes in
different organs (testes and embryos). It is a mashup of data from the
Berkeley Drosophila Genome Project (BDGP) and the Drosophila Testis
Gene Expression Database (Fly-TED). It also uses data from FlyBase to
disambiguate gene name synonyms.

It's a pure AJAX application using SPARQL to access data from each of
the three sources on the fly (pardon the pun :).


= RDF Data =

The following RDF data used in the search application above are
available for bulk download:

* http://openflydata.org/dump/flybase (latest)
  http://openflydata.org/dump/flybase_genenames_20081017 (snapshot)

  data on D. melanogaster gene identifiers, symbols and synonyms,
  derived from flybase.org; approx 8 million triples; gzipped
  n-triples

* http://openflydata.org/dump/bdgp (latest)
  http://openflydata.org/dump/bdgp_images_20081030 (snapshot)

  metadata on images of embryo in situ gene expression experiments,
  derived from fruitfly.org; approx 1 million triples; gzipped
  n-triples

* http://openflydata.org/dump/flyted (latest)
  http://openflydata.org/dump/flyted_20080626 (snapshot)

  metadata on images testis in situ gene expression experiments,
  derived from www.fly-ted.org; approx 30,000 triples; gzipped turtle


= Data Services =

The following SPARQL endpoints are available for queries over the
above data. See also limitations below.

* http://openflydata.org/query/flybase (latest)
  http://openflydata.org/query/flybase_genenames_20081017 (snapshot)

* http://openflydata.org/query/bdgp (latest)
  http://openflydata.org/query/bdgp_images_20081030 (snapshot)

* http://openflydata.org/query/flyted (latest)
  http://openflydata.org/query/flyted_20080626 (snapshot)

Limitations: only GET requests are supported; only SELECT and ASK
queries are supported; only JSON results format is supported (request
must specify output=json); SELECT queries are limited to max 500
results; no more than 5 requests per second from any one origin

The endpoints are implemented using our own Java SPARQL protocol
implementation (SPARQLite, see below) backed by Jena TDB 0.6
stores. The endpoints run inside Tomcat 5.5 behind Apache 2.2 via
mod_jk, on a small EC2 instance, with TDB storing data on an attached
EBS volume.


= Software Downloads & Source Code =

* FlyUI
  http://flyui.googlecode.com

This is a library of composable javascript widgets, providing a
user-interface to above data. These widgets are used to build the
search application above. FlyUI is built on YAHOO's javascript user
interface library (YUI).

* SPARQLite
  http://sparqlite.googlecode.com

This is an experimental and incomplete implementation of the SPARQL
protocol, designed to work with Jena TDB or SDB stores. We're using
this as a platform to explore a number of quality of service issues
that SPARQL raises.


= Ontologies/Schemas =

The following OWL schemas are used in the above data:

* CHADO OWL Schema 
  http://purl.org/net/chado/schema/

This is an OWL representation of a subset of the CHADO relational
schema used by FlyBase (see http://gmod.org/wiki/Schema).

* FlyBase OWL Synonym Types
  http://purl.org/net/flybase/synonym-types/

This is a micro-ontology, representing the FlyBase synonym type
vocabulary.

* BDGP OWL Schema
  http://purl.org/net/bdgp/schema/

This is an OWL representation of a subset of the BDGP relational
schema.

* FlyTED OWL Schemas

These are under revision, to be published shortly.


= RDF Data Conversion Utilities =

The following utilities were developed to obtain the RDF data
described above:

* CHADO/FlyBase D2RQ Map
  http://code.google.com/p/openflydata/source/browse/trunk/flybase/genenames/d2r-flybase-genenames.ttl

This provides a mapping from the CHADO/FlyBase relational schema to
the CHADO/FlyBase OWL ontologies, for basic D. melanogaster gene
(feature) data (identifiers, symbols, synonyms, species).

* BDGP D2RQ Map
  http://code.google.com/p/openflydata/source/browse/trunk/bdgp/imagemapping/d2r-bdgp-insituimages.ttl

This maps the BDGP relational schema to OWL/RDF.

See also: http://openflydata.googlecode.com


= Future Developments =

We're currently working on improving the user interface to the BDGP
data (grouping and ordering images by developmental stage) and on
integrated expression level data from FlyAtlas.

Other suggestions for future developments are warmly welcomed.


= Acknowledgments =

Thanks especially to Helen White-Cooper and Andy Seaborne for all
their help.

The FlyWeb Project is funded by the UK Joint Information Systems
Committee (JISC).


= Further Information =

The FlyWeb project website is at:

http://imageweb.zoo.ox.ac.uk/wiki/index.php/FlyWeb_project

Graham will be presenting this work at the UK SWIG meeting next week.

Or send us an email :)

Kind regards,

Alistair Miles
Jun Zhao
Graham Klyne
David Shotton


-- 
Alistair Miles
Senior Computing Officer
Image Bioinformatics Research Group
Department of Zoology
The Tinbergen Building
University of Oxford
South Parks Road
Oxford
OX1 3PS
United Kingdom
Web: http://purl.org/net/aliman
Email: alistair.miles@zoo.ox.ac.uk
Tel: +44 (0)1865 281993

Received on Wednesday, 5 November 2008 16:41:58 UTC