Announcement: Bio2RDF 0.3 released from Peter Ansell on 2009-03-20 (public-semweb-lifesci@w3.org from March 2009)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Fri, 20 Mar 2009 16:56:01 +1000
To: Bio2Rdf Mailing List <bio2rdf@googlegroups.com>, w3c semweb hcls <public-semweb-lifesci@w3.org>, "public-lod@w3.org" <public-lod@w3.org>
Cc: Paul Roe <p.roe@qut.edu.au>, James Hogan <j.hogan@qut.edu.au>, Lawrence Buckingham <l.buckingham@qut.edu.au>
Message-ID: <a1be7e0e0903192356n39fe46ffu825862800fc1fef8@mail.gmail.com>
Hi all,

Today we released the 0.3 version of the Bio2RDF server code, with
significant differences to the 0.2 version.  You can download it from
sourceforge at [1]

Some of the new features are as follows:

Implement support for more non-Bio2RDF SPARQL endpoints such as
LinkedCT, DrugBank, Dailymed, Diseasome, Neurocommons, DBPedia, and
Flyted/Flybase . The relevant namespaces for these inside of Bio2RDF
are:

* DBpedia - dbpedia, dbpedia_property, dbpedia_class
* LinkedCT - linkedct_ontology, linkedct_intervention,
linkedct_trials, linkedct_collabagency, linkedct_condition,
linkedct_link, linkedct_location, linkedct_overall_official,
linkedct_oversight, linkedct_primary_outcomes, linkedct_reference,
linkedct_results_reference, linkedct_secondary_outcomes,
linkedct_arm_group
* Dailymed - dailymed_ontology, dailymed_drugs,
dailymed_inactiveingredient, dailymed_routeofadministration,
dailymed_organization
* DrugBank - drugbank_ontology, drugbank_druginteractions,
drugbank_drugs, drugbank_enzymes, drugbank_drugtype,
drugbank_drugcategory, drugbank_dosageforms, drugbank_targets
* Diseasome - diseasome_ontology, diseasome_diseases, diseasome_genes,
diseasome_chromosomallocation, diseasome_diseaseclass
* Neurocommons - Uses the equivalent Bio2RDF namespaces, with live
owl:sameAs links back to the relevant Neurocommons namespaces. Used
for pubmed, geneid, taxonomy, mesh, prosite and go so far
* Flyted/Flybase etc not converted yet, only direct access provided

Provide live owl:sameAs references which match those used in SPARQL
queries to keep linkages to the original databases without leaving the
database:identifier paradigm, so if people know the DBPedia, etc.,
URI's, the link to their current knowledge is given

* Some http://database.bio2rdf.org/database:identifier URI's are given
by this, but these aren't standard, and are only shown where there is
still at least one SPARQL endpoint available which uses them. People
should utilise the http://bio2rdf.org/database:identifier versions
when linking to Bio2RDF.

Integrated Semantic Web Pipes (pipes.deri.org) (version 0.7) so the
pipes runtime engine can be utilised on the same server as bio2rdf.
The main servers have a limited number of pipes available so far, but
more can be included by people wishing to contribute their pipes. The
URL syntax is /pipes/PIPEID/parameter1=value1/parameter2=value2 . This
provides a method for people wanting to utilise complex mashup
scenarios and provide them back to the community, as by default the
bio2rdf engine only knows how to do simple integration of RDF sources
into a single output document

The two currently available pipes are:
* /pipes/bio2rdf_basic/database=DATABASE/identifier=IDENTIFIER Mirrors
/database:identifier functionality
* /pipes/bio2rdf_subject_object_slicing/database=DATABASE/identifier=IDENTIFIER
Combines /database:identifier and /links/database:identifier
functionality into one operation

Namespace synonyms can be implemented, with the first example that of
taxon and taxonomy for NCBI taxonomy as so far there hasn't been a
clear bias towards one or the other, and together with interlinked
owl:sameAs statements the synonyms will provide resolution to a
standard URI no matter which one is provided in the URI.

* http://bio2rdf.org/taxon:identifier will return information in the
form http://bio2rdf.org/taxonomy:identifier currently, with an
owl:sameAs link back to the taxon version. This can be switched if
people in general prefer the taxon version as the default, although in
general this is an issue still as it is difficult to make up SPARQL
queries outside of the Bio2RDF server for these heterogeneous sources

Provide live statistics to diagnose some network issues without having
to look at log files. The URL is /admin/stats

* Shows the last time the internal blacklist reset, indicating how
much activity is being displayed as the statistics are reset everytime
the blacklist is reset.
* By default shows the IP's accessing the server, with an indication
of the number and duration of their queries. Can be configured in low
use and private situations to also show the queries being performed
* Shows the servers which have been unresponsive since the last
blacklist reset including a basic reason, such as an HTTP 503 or 400
error

Implement true RDF handling in the background to provide consistency
of output and the potential to support multiple output formats such as
NTriples and Turtle, although the only output currently supported is
RDF/XML. The Sesame library is being used to provide this
functionality.

Provide more RDFiser scripts as part of the source distribution,
including Chebi, GO, Homologene, NCBI Geneid, HGNC, OBO and Ecocyc

Provide more links to HTML provider URL's for given databases to
provide the link between the Bio2RDF RDF interface and currently
available HTML interfaces. The URL syntax for this is
/html/database:identifier

Provide links to licence providers, so the applicable license for a
database may be available by following a URL. The URL syntax for this
is /license/database:identifier . It was easier to require the
identifier to be present than to not have it. So far, the identifier
portion is not being used, so it merely has to be present for the URL
resolution to occur, but in future there is the allowance to have
different licenses being given based on the identifier, which is
useful for databases which are not completely released under a single
license.

Provide countlinks and countlinksns which count the number of reverse
links to a particular item from globally, or from within a given
database. Currently these only function on virtuoso endpoints due to
their use of aggregation extensions to SPARQL. The URL syntax is
/countlinks/database:identifier and
/countlinksns/targetdatabase/database:identifier

Provide search and searchns, which attempt to search globally using
SPARQL (aren't currently linked to the rdfiser search pages which may
be accessed using searchns), or search within a particular database
for text searches. The searches are all performed using the virtuoso
fulltext search paradigm, ie, bif:contains, and other sparql endpoints
haven't yet been implemented even with regex because it is reasonably
slow but it would be simple to construct a query if people thought it
was necessary. The URL syntax is /search/searchTerm and
/searchns/targetdatabase:searchTerm

If anyone has any SPARQL queries on biology related databases that
they regularly execute that can either be parameterised or turned into
Pipes then it would be great to include them in future distributions
for others to use.

Cheers,

Peter Ansell

[1] https://sourceforge.net/project/platformdownload.php?group_id=142631
Received on Friday, 20 March 2009 06:56:43 UTC