Re: [ANN] News from the FlyWeb Project from Jun Zhao on 2008-11-06 (public-semweb-lifesci@w3.org from November 2008)

From: Jun Zhao <jun.zhao@zoo.ox.ac.uk>
Date: Thu, 06 Nov 2008 10:27:26 +0000
To: Chris Mungall <cjm@berkeleybop.org>
CC: Alistair Miles <alistair.miles@zoo.ox.ac.uk>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>, David Sutherland <djs93@gen.cam.ac.uk>, BioImage <bioimage@mail.ontonet.org>
Message-ID: <4912C68E.3020001@zoo.ox.ac.uk>
Hi Chris,

Glad to know that you are interested.

Chris Mungall wrote:
> 
> Hi Alistair
> 
> The UI is very nice!
> 
> I'm curious that you don't include any ontologies. The source datasets 
> are quite ontology-centric (the Chado database in particular). The BDGP 
> data includes annotation of each individual image with terms from 
> fly_anatomy. This allows you to query for genes expressed in the brain 
> (including its parts), or expressed in tissue derived from the 
> neurectoderm for example.
> 
> A while back I created a D2RQ mapping of both the BDGP InSitu databases 
> and Chado. See:
> 
>     http://www.bioontology.org/wiki/index.php/OBD:SPARQL-InSitu

Certainly we looked at your previous work. Actually our very first 
version of in-house bdgp sparql endpoint was based on your work. Then, 
we decided to take an incremental approach, matching only the minimal 
subset of bdgp database for the need of our application.

We would have liked to use the fly_anatomy ontology to define the gene 
expressions. However, when I had a briefly look of the bdgp database, 
the accession numbers of the gene expression terms seem no longer to be 
consistent with the latest fly_anatomy ontology [1]. For example, the 
term "dorsal compartment" is associated with "FBbt:00005876" in the bdgp 
database, while "FBbt:00005876" is named as "dorsal central 
protocerebral neuroblast" in the fly_anatomy ontology. We would love to 
have more information from BDGP team with respect to this.

[1] http://www.obofoundry.org/cgi-bin/detail.cgi?id=fly_anatomy	

> In your Chado mapping, you're really just extracting synonym 
> information. Is there really a need to define a new ontology here, 
> rather than using, say, SKOS? Do you have plans to map more of the 
> schema? I'm particularly interested in the representation of genomic 
> intervals, and scalable querying.

The reason we used very light ontology-centric approach is because we 
wanted to impose as little as possible any interpretation of the source 
data. And same as what we did with BDGP database, we only extracted the 
minimum set of information from chado schema. We are planning to extract 
some other fields from flybase in the near future, driven by the needs 
of our scientists. But if you have any use cases relating to the 
"genomic intervals" information, we would be very interested to know:).

> 
> You provide 3 SPARQL endpoints. It looks like you're doing the mashup in 
> the UI. In many ways this is a traditional AJAX architecture, albeit 
> with SPARQL endpoints rather than, say, a REST interface to a relational 
> db. Did you find the triplestore/SPARQL route had particular advantages 
> (or disadvantages)? What can you do that you can't do by simply going 
> straight to the relational dbs?

One research goal of our project is to investigate to what extent the 
existing semantic web technologies and tools could be used to support 
real use cases; hence the sparql endpoints over all the datasets. And en 
route to this, we have found out many interesting questions to answer, 
such as scalability, identity mapping, etc.

We appreciated the flexibility of RDF data model, which allows us to 
impose an RDF view of the source relational data based on the needs of 
our application and to work with a very lightweight, unified data layer. 
And also of course, mapping these interesting data resources into RDF 
gives our the opportunity of linking our data resources to others, 
making it possible to re-use and data integration.

In the future release, we are interesting in investigating some light 
weight reasoning, such as searching for gene expression images using the 
fly anatomy ontology, once we sort out some detailed ontology mapping 
problem.

> 
> I'm not sure why you needed to write your own SPARQL protocol on top of 
> Jena. Isn't this what Joseki does?

The main reason was because (by the time of our experiment) Joseki did 
not handle multiple concurrent connections to the underlying RDF store. 
You could find more details from: 
http://code.google.com/p/sparqlite/wiki/HomePage. :)
> 
> Interested to see future developments

Jun
> 
> Cheers
> Chris
> 
> Cheers
> Chris
> 
> On Nov 5, 2008, at 8:44 AM, Alistair Miles wrote:
> 
>>
>> Dear all,
>>
>> This is a summary of work so far by the FlyWeb Project team. We're
>> exploring integration of life science data in support of Drosophila
>> (fruit fly) functional genomics. We'd like to develop credible, robust
>> and genuinely useful tools for the Drosophila research community; and
>> to provide data and services of value to bioinformaticians and
>> Semantic Web / Life Science developers.
>>
>> This is the first time we've announced our work more widely, and we'd
>> very much appreciate thoughts, suggestions, feedback, re-use and
>> testing of the applications, services, software and data described
>> below. Please note however that this is work in progress, and things
>> may break, change, move or disappear without notice.
>>
>>
>> = Search Applications =
>>
>> http://openflydata.org/search/insitus
>>
>> This application allows you to search for images of in situ RNA
>> hybridisation experiments, depicting expression of specific genes in
>> different organs (testes and embryos). It is a mashup of data from the
>> Berkeley Drosophila Genome Project (BDGP) and the Drosophila Testis
>> Gene Expression Database (Fly-TED). It also uses data from FlyBase to
>> disambiguate gene name synonyms.
>>
>> It's a pure AJAX application using SPARQL to access data from each of
>> the three sources on the fly (pardon the pun :).
>>
>>
>> = RDF Data =
>>
>> The following RDF data used in the search application above are
>> available for bulk download:
>>
>> * http://openflydata.org/dump/flybase (latest)
>>  http://openflydata.org/dump/flybase_genenames_20081017 (snapshot)
>>
>>  data on D. melanogaster gene identifiers, symbols and synonyms,
>>  derived from flybase.org; approx 8 million triples; gzipped
>>  n-triples
>>
>> * http://openflydata.org/dump/bdgp (latest)
>>  http://openflydata.org/dump/bdgp_images_20081030 (snapshot)
>>
>>  metadata on images of embryo in situ gene expression experiments,
>>  derived from fruitfly.org; approx 1 million triples; gzipped
>>  n-triples
>>
>> * http://openflydata.org/dump/flyted (latest)
>>  http://openflydata.org/dump/flyted_20080626 (snapshot)
>>
>>  metadata on images testis in situ gene expression experiments,
>>  derived from www.fly-ted.org; approx 30,000 triples; gzipped turtle
>>
>>
>> = Data Services =
>>
>> The following SPARQL endpoints are available for queries over the
>> above data. See also limitations below.
>>
>> * http://openflydata.org/query/flybase (latest)
>>  http://openflydata.org/query/flybase_genenames_20081017 (snapshot)
>>
>> * http://openflydata.org/query/bdgp (latest)
>>  http://openflydata.org/query/bdgp_images_20081030 (snapshot)
>>
>> * http://openflydata.org/query/flyted (latest)
>>  http://openflydata.org/query/flyted_20080626 (snapshot)
>>
>> Limitations: only GET requests are supported; only SELECT and ASK
>> queries are supported; only JSON results format is supported (request
>> must specify output=json); SELECT queries are limited to max 500
>> results; no more than 5 requests per second from any one origin
>>
>> The endpoints are implemented using our own Java SPARQL protocol
>> implementation (SPARQLite, see below) backed by Jena TDB 0.6
>> stores. The endpoints run inside Tomcat 5.5 behind Apache 2.2 via
>> mod_jk, on a small EC2 instance, with TDB storing data on an attached
>> EBS volume.
>>
>>
>> = Software Downloads & Source Code =
>>
>> * FlyUI
>>  http://flyui.googlecode.com
>>
>> This is a library of composable javascript widgets, providing a
>> user-interface to above data. These widgets are used to build the
>> search application above. FlyUI is built on YAHOO's javascript user
>> interface library (YUI).
>>
>> * SPARQLite
>>  http://sparqlite.googlecode.com
>>
>> This is an experimental and incomplete implementation of the SPARQL
>> protocol, designed to work with Jena TDB or SDB stores. We're using
>> this as a platform to explore a number of quality of service issues
>> that SPARQL raises.
>>
>>
>> = Ontologies/Schemas =
>>
>> The following OWL schemas are used in the above data:
>>
>> * CHADO OWL Schema
>>  http://purl.org/net/chado/schema/
>>
>> This is an OWL representation of a subset of the CHADO relational
>> schema used by FlyBase (see http://gmod.org/wiki/Schema).
>>
>> * FlyBase OWL Synonym Types
>>  http://purl.org/net/flybase/synonym-types/
>>
>> This is a micro-ontology, representing the FlyBase synonym type
>> vocabulary.
>>
>> * BDGP OWL Schema
>>  http://purl.org/net/bdgp/schema/
>>
>> This is an OWL representation of a subset of the BDGP relational
>> schema.
>>
>> * FlyTED OWL Schemas
>>
>> These are under revision, to be published shortly.
>>
>>
>> = RDF Data Conversion Utilities =
>>
>> The following utilities were developed to obtain the RDF data
>> described above:
>>
>> * CHADO/FlyBase D2RQ Map
>>  http://code.google.com/p/openflydata/source/browse/trunk/flybase/genenames/d2r-flybase-genenames.ttl 
>>
>>
>> This provides a mapping from the CHADO/FlyBase relational schema to
>> the CHADO/FlyBase OWL ontologies, for basic D. melanogaster gene
>> (feature) data (identifiers, symbols, synonyms, species).
>>
>> * BDGP D2RQ Map
>>  http://code.google.com/p/openflydata/source/browse/trunk/bdgp/imagemapping/d2r-bdgp-insituimages.ttl 
>>
>>
>> This maps the BDGP relational schema to OWL/RDF.
>>
>> See also: http://openflydata.googlecode.com
>>
>>
>> = Future Developments =
>>
>> We're currently working on improving the user interface to the BDGP
>> data (grouping and ordering images by developmental stage) and on
>> integrated expression level data from FlyAtlas.
>>
>> Other suggestions for future developments are warmly welcomed.
>>
>>
>> = Acknowledgments =
>>
>> Thanks especially to Helen White-Cooper and Andy Seaborne for all
>> their help.
>>
>> The FlyWeb Project is funded by the UK Joint Information Systems
>> Committee (JISC).
>>
>>
>> = Further Information =
>>
>> The FlyWeb project website is at:
>>
>> http://imageweb.zoo.ox.ac.uk/wiki/index.php/FlyWeb_project
>>
>> Graham will be presenting this work at the UK SWIG meeting next week.
>>
>> Or send us an email :)
>>
>> Kind regards,
>>
>> Alistair Miles
>> Jun Zhao
>> Graham Klyne
>> David Shotton
>>
>>
>> -- 
>> Alistair Miles
>> Senior Computing Officer
>> Image Bioinformatics Research Group
>> Department of Zoology
>> The Tinbergen Building
>> University of Oxford
>> South Parks Road
>> Oxford
>> OX1 3PS
>> United Kingdom
>> Web: http://purl.org/net/aliman
>> Email: alistair.miles@zoo.ox.ac.uk
>> Tel: +44 (0)1865 281993
>>
>>
>>
> 
> 
>
Received on Thursday, 6 November 2008 10:28:06 UTC