- From: Peter Ansell <ansell.peter@gmail.com>
- Date: Fri, 20 Mar 2009 16:56:01 +1000
- To: Bio2Rdf Mailing List <bio2rdf@googlegroups.com>, w3c semweb hcls <public-semweb-lifesci@w3.org>, "public-lod@w3.org" <public-lod@w3.org>
- Cc: Paul Roe <p.roe@qut.edu.au>, James Hogan <j.hogan@qut.edu.au>, Lawrence Buckingham <l.buckingham@qut.edu.au>
Hi all, Today we released the 0.3 version of the Bio2RDF server code, with significant differences to the 0.2 version. You can download it from sourceforge at [1] Some of the new features are as follows: Implement support for more non-Bio2RDF SPARQL endpoints such as LinkedCT, DrugBank, Dailymed, Diseasome, Neurocommons, DBPedia, and Flyted/Flybase . The relevant namespaces for these inside of Bio2RDF are: * DBpedia - dbpedia, dbpedia_property, dbpedia_class * LinkedCT - linkedct_ontology, linkedct_intervention, linkedct_trials, linkedct_collabagency, linkedct_condition, linkedct_link, linkedct_location, linkedct_overall_official, linkedct_oversight, linkedct_primary_outcomes, linkedct_reference, linkedct_results_reference, linkedct_secondary_outcomes, linkedct_arm_group * Dailymed - dailymed_ontology, dailymed_drugs, dailymed_inactiveingredient, dailymed_routeofadministration, dailymed_organization * DrugBank - drugbank_ontology, drugbank_druginteractions, drugbank_drugs, drugbank_enzymes, drugbank_drugtype, drugbank_drugcategory, drugbank_dosageforms, drugbank_targets * Diseasome - diseasome_ontology, diseasome_diseases, diseasome_genes, diseasome_chromosomallocation, diseasome_diseaseclass * Neurocommons - Uses the equivalent Bio2RDF namespaces, with live owl:sameAs links back to the relevant Neurocommons namespaces. Used for pubmed, geneid, taxonomy, mesh, prosite and go so far * Flyted/Flybase etc not converted yet, only direct access provided Provide live owl:sameAs references which match those used in SPARQL queries to keep linkages to the original databases without leaving the database:identifier paradigm, so if people know the DBPedia, etc., URI's, the link to their current knowledge is given * Some http://database.bio2rdf.org/database:identifier URI's are given by this, but these aren't standard, and are only shown where there is still at least one SPARQL endpoint available which uses them. People should utilise the http://bio2rdf.org/database:identifier versions when linking to Bio2RDF. Integrated Semantic Web Pipes (pipes.deri.org) (version 0.7) so the pipes runtime engine can be utilised on the same server as bio2rdf. The main servers have a limited number of pipes available so far, but more can be included by people wishing to contribute their pipes. The URL syntax is /pipes/PIPEID/parameter1=value1/parameter2=value2 . This provides a method for people wanting to utilise complex mashup scenarios and provide them back to the community, as by default the bio2rdf engine only knows how to do simple integration of RDF sources into a single output document The two currently available pipes are: * /pipes/bio2rdf_basic/database=DATABASE/identifier=IDENTIFIER Mirrors /database:identifier functionality * /pipes/bio2rdf_subject_object_slicing/database=DATABASE/identifier=IDENTIFIER Combines /database:identifier and /links/database:identifier functionality into one operation Namespace synonyms can be implemented, with the first example that of taxon and taxonomy for NCBI taxonomy as so far there hasn't been a clear bias towards one or the other, and together with interlinked owl:sameAs statements the synonyms will provide resolution to a standard URI no matter which one is provided in the URI. * http://bio2rdf.org/taxon:identifier will return information in the form http://bio2rdf.org/taxonomy:identifier currently, with an owl:sameAs link back to the taxon version. This can be switched if people in general prefer the taxon version as the default, although in general this is an issue still as it is difficult to make up SPARQL queries outside of the Bio2RDF server for these heterogeneous sources Provide live statistics to diagnose some network issues without having to look at log files. The URL is /admin/stats * Shows the last time the internal blacklist reset, indicating how much activity is being displayed as the statistics are reset everytime the blacklist is reset. * By default shows the IP's accessing the server, with an indication of the number and duration of their queries. Can be configured in low use and private situations to also show the queries being performed * Shows the servers which have been unresponsive since the last blacklist reset including a basic reason, such as an HTTP 503 or 400 error Implement true RDF handling in the background to provide consistency of output and the potential to support multiple output formats such as NTriples and Turtle, although the only output currently supported is RDF/XML. The Sesame library is being used to provide this functionality. Provide more RDFiser scripts as part of the source distribution, including Chebi, GO, Homologene, NCBI Geneid, HGNC, OBO and Ecocyc Provide more links to HTML provider URL's for given databases to provide the link between the Bio2RDF RDF interface and currently available HTML interfaces. The URL syntax for this is /html/database:identifier Provide links to licence providers, so the applicable license for a database may be available by following a URL. The URL syntax for this is /license/database:identifier . It was easier to require the identifier to be present than to not have it. So far, the identifier portion is not being used, so it merely has to be present for the URL resolution to occur, but in future there is the allowance to have different licenses being given based on the identifier, which is useful for databases which are not completely released under a single license. Provide countlinks and countlinksns which count the number of reverse links to a particular item from globally, or from within a given database. Currently these only function on virtuoso endpoints due to their use of aggregation extensions to SPARQL. The URL syntax is /countlinks/database:identifier and /countlinksns/targetdatabase/database:identifier Provide search and searchns, which attempt to search globally using SPARQL (aren't currently linked to the rdfiser search pages which may be accessed using searchns), or search within a particular database for text searches. The searches are all performed using the virtuoso fulltext search paradigm, ie, bif:contains, and other sparql endpoints haven't yet been implemented even with regex because it is reasonably slow but it would be simple to construct a query if people thought it was necessary. The URL syntax is /search/searchTerm and /searchns/targetdatabase:searchTerm If anyone has any SPARQL queries on biology related databases that they regularly execute that can either be parameterised or turned into Pipes then it would be great to include them in future distributions for others to use. Cheers, Peter Ansell [1] https://sourceforge.net/project/platformdownload.php?group_id=142631
Received on Friday, 20 March 2009 06:56:43 UTC