Re: [ANN] News from the FlyWeb Project

Hi Alistair

The UI is very nice!

I'm curious that you don't include any ontologies. The source datasets  
are quite ontology-centric (the Chado database in particular). The  
BDGP data includes annotation of each individual image with terms from  
fly_anatomy. This allows you to query for genes expressed in the brain  
(including its parts), or expressed in tissue derived from the  
neurectoderm for example.

A while back I created a D2RQ mapping of both the BDGP InSitu  
databases and Chado. See:

My approach was slightly different in that I was aiming for an  
ontologically sound representation rather than simply recapitulating  
the schema in RDFS. In retrospect, this was probably a little over  
ambitious given the limitations of RDF technology and D2RQ in  
particular. Things may change when we have more OWL-centric databases  
and SQL mapping technology.

In your Chado mapping, you're really just extracting synonym  
information. Is there really a need to define a new ontology here,  
rather than using, say, SKOS? Do you have plans to map more of the  
schema? I'm particularly interested in the representation of genomic  
intervals, and scalable querying.

You provide 3 SPARQL endpoints. It looks like you're doing the mashup  
in the UI. In many ways this is a traditional AJAX architecture,  
albeit with SPARQL endpoints rather than, say, a REST interface to a  
relational db. Did you find the triplestore/SPARQL route had  
particular advantages (or disadvantages)? What can you do that you  
can't do by simply going straight to the relational dbs?

I'm not sure why you needed to write your own SPARQL protocol on top  
of Jena. Isn't this what Joseki does?

Interested to see future developments



On Nov 5, 2008, at 8:44 AM, Alistair Miles wrote:

> Dear all,
> This is a summary of work so far by the FlyWeb Project team. We're
> exploring integration of life science data in support of Drosophila
> (fruit fly) functional genomics. We'd like to develop credible, robust
> and genuinely useful tools for the Drosophila research community; and
> to provide data and services of value to bioinformaticians and
> Semantic Web / Life Science developers.
> This is the first time we've announced our work more widely, and we'd
> very much appreciate thoughts, suggestions, feedback, re-use and
> testing of the applications, services, software and data described
> below. Please note however that this is work in progress, and things
> may break, change, move or disappear without notice.
> = Search Applications =
> This application allows you to search for images of in situ RNA
> hybridisation experiments, depicting expression of specific genes in
> different organs (testes and embryos). It is a mashup of data from the
> Berkeley Drosophila Genome Project (BDGP) and the Drosophila Testis
> Gene Expression Database (Fly-TED). It also uses data from FlyBase to
> disambiguate gene name synonyms.
> It's a pure AJAX application using SPARQL to access data from each of
> the three sources on the fly (pardon the pun :).
> = RDF Data =
> The following RDF data used in the search application above are
> available for bulk download:
> * (latest)
> (snapshot)
>  data on D. melanogaster gene identifiers, symbols and synonyms,
>  derived from; approx 8 million triples; gzipped
>  n-triples
> * (latest)
> (snapshot)
>  metadata on images of embryo in situ gene expression experiments,
>  derived from; approx 1 million triples; gzipped
>  n-triples
> * (latest)
> (snapshot)
>  metadata on images testis in situ gene expression experiments,
>  derived from; approx 30,000 triples; gzipped turtle
> = Data Services =
> The following SPARQL endpoints are available for queries over the
> above data. See also limitations below.
> * (latest)
> (snapshot)
> * (latest)
> (snapshot)
> * (latest)
> (snapshot)
> Limitations: only GET requests are supported; only SELECT and ASK
> queries are supported; only JSON results format is supported (request
> must specify output=json); SELECT queries are limited to max 500
> results; no more than 5 requests per second from any one origin
> The endpoints are implemented using our own Java SPARQL protocol
> implementation (SPARQLite, see below) backed by Jena TDB 0.6
> stores. The endpoints run inside Tomcat 5.5 behind Apache 2.2 via
> mod_jk, on a small EC2 instance, with TDB storing data on an attached
> EBS volume.
> = Software Downloads & Source Code =
> * FlyUI
> This is a library of composable javascript widgets, providing a
> user-interface to above data. These widgets are used to build the
> search application above. FlyUI is built on YAHOO's javascript user
> interface library (YUI).
> * SPARQLite
> This is an experimental and incomplete implementation of the SPARQL
> protocol, designed to work with Jena TDB or SDB stores. We're using
> this as a platform to explore a number of quality of service issues
> that SPARQL raises.
> = Ontologies/Schemas =
> The following OWL schemas are used in the above data:
> * CHADO OWL Schema
> This is an OWL representation of a subset of the CHADO relational
> schema used by FlyBase (see
> * FlyBase OWL Synonym Types
> This is a micro-ontology, representing the FlyBase synonym type
> vocabulary.
> * BDGP OWL Schema
> This is an OWL representation of a subset of the BDGP relational
> schema.
> * FlyTED OWL Schemas
> These are under revision, to be published shortly.
> = RDF Data Conversion Utilities =
> The following utilities were developed to obtain the RDF data
> described above:
> * CHADO/FlyBase D2RQ Map
> This provides a mapping from the CHADO/FlyBase relational schema to
> the CHADO/FlyBase OWL ontologies, for basic D. melanogaster gene
> (feature) data (identifiers, symbols, synonyms, species).
> * BDGP D2RQ Map
> This maps the BDGP relational schema to OWL/RDF.
> See also:
> = Future Developments =
> We're currently working on improving the user interface to the BDGP
> data (grouping and ordering images by developmental stage) and on
> integrated expression level data from FlyAtlas.
> Other suggestions for future developments are warmly welcomed.
> = Acknowledgments =
> Thanks especially to Helen White-Cooper and Andy Seaborne for all
> their help.
> The FlyWeb Project is funded by the UK Joint Information Systems
> Committee (JISC).
> = Further Information =
> The FlyWeb project website is at:
> Graham will be presenting this work at the UK SWIG meeting next week.
> Or send us an email :)
> Kind regards,
> Alistair Miles
> Jun Zhao
> Graham Klyne
> David Shotton
> -- 
> Alistair Miles
> Senior Computing Officer
> Image Bioinformatics Research Group
> Department of Zoology
> The Tinbergen Building
> University of Oxford
> South Parks Road
> Oxford
> OX1 3PS
> United Kingdom
> Web:
> Email:
> Tel: +44 (0)1865 281993

Received on Thursday, 6 November 2008 08:07:32 UTC