- From: Alistair Miles <alistair.miles@zoo.ox.ac.uk>
- Date: Wed, 5 Nov 2008 16:41:18 +0000
- To: semantic-web@w3.org
Dear all, This is a summary of work so far by the FlyWeb Project team. We're exploring integration of life science data in support of Drosophila (fruit fly) functional genomics. We'd like to develop credible, robust and genuinely useful tools for the Drosophila research community; and to provide data and services of value to bioinformaticians and Semantic Web / Life Science developers. This is the first time we've announced our work more widely, and we'd very much appreciate thoughts, suggestions, feedback, re-use and testing of the applications, services, software and data described below. Please note however that this is work in progress, and things may break, change, move or disappear without notice. = Search Applications = http://openflydata.org/search/insitus This application allows you to search for images of in situ RNA hybridisation experiments, depicting expression of specific genes in different organs (testes and embryos). It is a mashup of data from the Berkeley Drosophila Genome Project (BDGP) and the Drosophila Testis Gene Expression Database (Fly-TED). It also uses data from FlyBase to disambiguate gene name synonyms. It's a pure AJAX application using SPARQL to access data from each of the three sources on the fly (pardon the pun :). = RDF Data = The following RDF data used in the search application above are available for bulk download: * http://openflydata.org/dump/flybase (latest) http://openflydata.org/dump/flybase_genenames_20081017 (snapshot) data on D. melanogaster gene identifiers, symbols and synonyms, derived from flybase.org; approx 8 million triples; gzipped n-triples * http://openflydata.org/dump/bdgp (latest) http://openflydata.org/dump/bdgp_images_20081030 (snapshot) metadata on images of embryo in situ gene expression experiments, derived from fruitfly.org; approx 1 million triples; gzipped n-triples * http://openflydata.org/dump/flyted (latest) http://openflydata.org/dump/flyted_20080626 (snapshot) metadata on images testis in situ gene expression experiments, derived from www.fly-ted.org; approx 30,000 triples; gzipped turtle = Data Services = The following SPARQL endpoints are available for queries over the above data. See also limitations below. * http://openflydata.org/query/flybase (latest) http://openflydata.org/query/flybase_genenames_20081017 (snapshot) * http://openflydata.org/query/bdgp (latest) http://openflydata.org/query/bdgp_images_20081030 (snapshot) * http://openflydata.org/query/flyted (latest) http://openflydata.org/query/flyted_20080626 (snapshot) Limitations: only GET requests are supported; only SELECT and ASK queries are supported; only JSON results format is supported (request must specify output=json); SELECT queries are limited to max 500 results; no more than 5 requests per second from any one origin The endpoints are implemented using our own Java SPARQL protocol implementation (SPARQLite, see below) backed by Jena TDB 0.6 stores. The endpoints run inside Tomcat 5.5 behind Apache 2.2 via mod_jk, on a small EC2 instance, with TDB storing data on an attached EBS volume. = Software Downloads & Source Code = * FlyUI http://flyui.googlecode.com This is a library of composable javascript widgets, providing a user-interface to above data. These widgets are used to build the search application above. FlyUI is built on YAHOO's javascript user interface library (YUI). * SPARQLite http://sparqlite.googlecode.com This is an experimental and incomplete implementation of the SPARQL protocol, designed to work with Jena TDB or SDB stores. We're using this as a platform to explore a number of quality of service issues that SPARQL raises. = Ontologies/Schemas = The following OWL schemas are used in the above data: * CHADO OWL Schema http://purl.org/net/chado/schema/ This is an OWL representation of a subset of the CHADO relational schema used by FlyBase (see http://gmod.org/wiki/Schema). * FlyBase OWL Synonym Types http://purl.org/net/flybase/synonym-types/ This is a micro-ontology, representing the FlyBase synonym type vocabulary. * BDGP OWL Schema http://purl.org/net/bdgp/schema/ This is an OWL representation of a subset of the BDGP relational schema. * FlyTED OWL Schemas These are under revision, to be published shortly. = RDF Data Conversion Utilities = The following utilities were developed to obtain the RDF data described above: * CHADO/FlyBase D2RQ Map http://code.google.com/p/openflydata/source/browse/trunk/flybase/genenames/d2r-flybase-genenames.ttl This provides a mapping from the CHADO/FlyBase relational schema to the CHADO/FlyBase OWL ontologies, for basic D. melanogaster gene (feature) data (identifiers, symbols, synonyms, species). * BDGP D2RQ Map http://code.google.com/p/openflydata/source/browse/trunk/bdgp/imagemapping/d2r-bdgp-insituimages.ttl This maps the BDGP relational schema to OWL/RDF. See also: http://openflydata.googlecode.com = Future Developments = We're currently working on improving the user interface to the BDGP data (grouping and ordering images by developmental stage) and on integrated expression level data from FlyAtlas. Other suggestions for future developments are warmly welcomed. = Acknowledgments = Thanks especially to Helen White-Cooper and Andy Seaborne for all their help. The FlyWeb Project is funded by the UK Joint Information Systems Committee (JISC). = Further Information = The FlyWeb project website is at: http://imageweb.zoo.ox.ac.uk/wiki/index.php/FlyWeb_project Graham will be presenting this work at the UK SWIG meeting next week. Or send us an email :) Kind regards, Alistair Miles Jun Zhao Graham Klyne David Shotton -- Alistair Miles Senior Computing Officer Image Bioinformatics Research Group Department of Zoology The Tinbergen Building University of Oxford South Parks Road Oxford OX1 3PS United Kingdom Web: http://purl.org/net/aliman Email: alistair.miles@zoo.ox.ac.uk Tel: +44 (0)1865 281993
Received on Wednesday, 5 November 2008 16:41:58 UTC