Re: Announcement: Bio2RDF 0.3 released from Michel Dumontier on 2009-03-20 (public-semweb-lifesci@w3.org from March 2009)

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Fri, 20 Mar 2009 09:55:20 -0400
To: bio2rdf@googlegroups.com
Cc: w3c semweb hcls <public-semweb-lifesci@w3.org>, "public-lod@w3.org" <public-lod@w3.org>, Paul Roe <p.roe@qut.edu.au>, James Hogan <j.hogan@qut.edu.au>, Lawrence Buckingham <l.buckingham@qut.edu.au>
Message-ID: <c8edab680903200655v7acc369co4763d44cf062009a@mail.gmail.com>
Hi Peter - Great work!  I have a question - why are there so many namespaces
for these resources:

>
> * DBpedia - dbpedia, dbpedia_property, dbpedia_class
> * LinkedCT - linkedct_ontology, linkedct_intervention,
> linkedct_trials, linkedct_collabagency, linkedct_condition,
> linkedct_link, linkedct_location, linkedct_overall_official,
> linkedct_oversight, linkedct_primary_outcomes, linkedct_reference,
> linkedct_results_reference, linkedct_secondary_outcomes,
> linkedct_arm_group
> * Dailymed - dailymed_ontology, dailymed_drugs,
> dailymed_inactiveingredient, dailymed_routeofadministration,
> dailymed_organization
> * DrugBank - drugbank_ontology, drugbank_druginteractions,
> drugbank_drugs, drugbank_enzymes, drugbank_drugtype,
> drugbank_drugcategory, drugbank_dosageforms, drugbank_targets
> * Diseasome - diseasome_ontology, diseasome_diseases, diseasome_genes,
> diseasome_chromosomallocation, diseasome_diseaseclass
> * Neurocommons - Uses the equivalent Bio2RDF namespaces, with live
> owl:sameAs links back to the relevant Neurocommons namespaces. Used
> for pubmed, geneid, taxonomy, mesh, prosite and go so far
> * Flyted/Flybase etc not converted yet, only direct access provided
>




>
> Provide live owl:sameAs references which match those used in SPARQL
> queries to keep linkages to the original databases without leaving the
> database:identifier paradigm, so if people know the DBPedia, etc.,
> URI's, the link to their current knowledge is given
>
> * Some http://database.bio2rdf.org/database:identifier URI's are given
> by this, but these aren't standard, and are only shown where there is
> still at least one SPARQL endpoint available which uses them. People
> should utilise the http://bio2rdf.org/database:identifier versions
> when linking to Bio2RDF.
>
> Integrated Semantic Web Pipes (pipes.deri.org) (version 0.7) so the
> pipes runtime engine can be utilised on the same server as bio2rdf.
> The main servers have a limited number of pipes available so far, but
> more can be included by people wishing to contribute their pipes. The
> URL syntax is /pipes/PIPEID/parameter1=value1/parameter2=value2 . This
> provides a method for people wanting to utilise complex mashup
> scenarios and provide them back to the community, as by default the
> bio2rdf engine only knows how to do simple integration of RDF sources
> into a single output document
>
> The two currently available pipes are:
> * /pipes/bio2rdf_basic/database=DATABASE/identifier=IDENTIFIER Mirrors
> /database:identifier functionality
> *
> /pipes/bio2rdf_subject_object_slicing/database=DATABASE/identifier=IDENTIFIER
> Combines /database:identifier and /links/database:identifier
> functionality into one operation
>

I didn't know about DERI pipes - looks fantastic! Thanks!


>
> Namespace synonyms can be implemented, with the first example that of
> taxon and taxonomy for NCBI taxonomy as so far there hasn't been a
> clear bias towards one or the other, and together with interlinked
> owl:sameAs statements the synonyms will provide resolution to a
> standard URI no matter which one is provided in the URI.
>
> * http://bio2rdf.org/taxon:identifier will return information in the
> form http://bio2rdf.org/taxonomy:identifier currently, with an
> owl:sameAs link back to the taxon version. This can be switched if
> people in general prefer the taxon version as the default, although in
> general this is an issue still as it is difficult to make up SPARQL
> queries outside of the Bio2RDF server for these heterogeneous sources
>

ok, which other sources are providing NCBI taxonomy info? and what namespace
prefix do they use?


>
> Provide live statistics to diagnose some network issues without having
> to look at log files. The URL is /admin/stats
>
> * Shows the last time the internal blacklist reset, indicating how
> much activity is being displayed as the statistics are reset everytime
> the blacklist is reset.
> * By default shows the IP's accessing the server, with an indication
> of the number and duration of their queries. Can be configured in low
> use and private situations to also show the queries being performed
> * Shows the servers which have been unresponsive since the last
> blacklist reset including a basic reason, such as an HTTP 503 or 400
> error
>
> Implement true RDF handling in the background to provide consistency
> of output and the potential to support multiple output formats such as
> NTriples and Turtle, although the only output currently supported is
> RDF/XML. The Sesame library is being used to provide this
> functionality.
>
> Provide more RDFiser scripts as part of the source distribution,
> including Chebi, GO, Homologene, NCBI Geneid, HGNC, OBO and Ecocyc
>
> Provide more links to HTML provider URL's for given databases to
> provide the link between the Bio2RDF RDF interface and currently
> available HTML interfaces. The URL syntax for this is
> /html/database:identifier
>
> Provide links to licence providers, so the applicable license for a
> database may be available by following a URL. The URL syntax for this
> is /license/database:identifier . It was easier to require the
> identifier to be present than to not have it. So far, the identifier
> portion is not being used, so it merely has to be present for the URL
> resolution to occur, but in future there is the allowance to have
> different licenses being given based on the identifier, which is
> useful for databases which are not completely released under a single
> license.
>
> Provide countlinks and countlinksns which count the number of reverse
> links to a particular item from globally, or from within a given
> database. Currently these only function on virtuoso endpoints due to
> their use of aggregation extensions to SPARQL. The URL syntax is
> /countlinks/database:identifier and
> /countlinksns/targetdatabase/database:identifier
>
> Provide search and searchns, which attempt to search globally using
> SPARQL (aren't currently linked to the rdfiser search pages which may
> be accessed using searchns), or search within a particular database
> for text searches. The searches are all performed using the virtuoso
> fulltext search paradigm, ie, bif:contains, and other sparql endpoints
> haven't yet been implemented even with regex because it is reasonably
> slow but it would be simple to construct a query if people thought it
> was necessary. The URL syntax is /search/searchTerm and
> /searchns/targetdatabase:searchTerm
>
> If anyone has any SPARQL queries on biology related databases that
> they regularly execute that can either be parameterised or turned into
> Pipes then it would be great to include them in future distributions
> for others to use.
>

absolutely!

-=Michel=-


>
> Cheers,
>
> Peter Ansell
>
> [1] https://sourceforge.net/project/platformdownload.php?group_id=142631
>
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups
> "bio2rdf" group.
> To post to this group, send email to bio2rdf@googlegroups.com
> To unsubscribe from this group, send email to
> bio2rdf+unsubscribe@googlegroups.com<bio2rdf%2Bunsubscribe@googlegroups.com>
> For more options, visit this group at
> http://groups.google.com/group/bio2rdf?hl=en
> -~----------~----~----~----~------~----~------~--~---
>
>


-- 
Michel Dumontier
Assistant Professor of Bioinformatics
http://dumontierlab.com
Received on Friday, 20 March 2009 13:55:58 UTC