W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > February 2013

[LLD Announce Monday] Bio2RDF Release 2: Improved coverage, interoperability and provenance of Linked Data for the Life Sciences

From: M. Scott Marshall <mscottmarshall@gmail.com>
Date: Sat, 9 Feb 2013 13:30:08 +0100
Message-ID: <CACHzV2PfEjiAhxBRUS0kcqyHAVBzgvt2Pu-2ZoS8BFsfitMqcQ@mail.gmail.com>
To: HCLS <public-semweb-lifesci@w3.org>
[Information on joining the meeting is at the bottom of this message. -Scott]

Title: Bio2RDF Release 2: Improved coverage, interoperability and
provenance of Linked Data for the Life Sciences

Abstract: Bio2RDF is an open source project that uses Semantic Web
technologies to build and provide the largest network of Linked Data
for the Life Sciences. Bio2RDF defines a set of simple conventions to
create RDF(S) compatible Linked Data from a diverse set of
heterogeneously formatted sources obtained from multiple data
providers.  Here, we present the second release of the Bio2RDF project
which features up-to-date, open-source scripts, IRI normalization
through a common dataset registry, dataset provenance, data metrics,
public SPARQL endpoints, and compressed RDF files and full
text-indexed Virtuoso triple stores for download.  We have
consolidated and updated Bio2RDF scripts into a single GitHub
repository (http://github.com/bio2rdf/bio2rdf-scripts), which
facilitates collaborative development through issue tracking, forking
and pull requests. The scripts are released with an MIT license,
making it available for any use (including commercial), modification
or redistribution. Provenance regarding when and how the data were
generated is provided using the W3C Vocabulary of Interlinked Datasets
(VoID), the Provenance vocabulary (PROV) and Dublin Core vocabulary.
Additional scripts were developed to compute intra- and inter-dataset
composition and connectivity. Nineteen datasets, including 5 new
datasets and 3 aggregate datasets, are now being offered as part of
Bio2RDF Release 2. Use of a common registry ensures that all Bio2RDF
datasets adhere to strict syntactic IRI patterns, thereby increasing
the quality of generated links over previous suggested patterns.
Quantitative metrics are now computed for each dataset and provide
elementary information such as the number of triples to a more
sophisticated graph of the relations between types. While these
metrics provide an important overview of dataset contents, they are
also used to assist in SPARQL query formulation and to monitor changes
to datasets over time. Pre-computation of these summaries frees up
computational resources for more interesting scientific queries and
also enable tracking of dataset changes with time, which will help
make projections about the hardware and software requirements. We
demonstrate how multiple open source tools can be used to visualize
and explore Bio2RDF data, as well as how dataset metrics may be used
to assist querying. Bio2RDF Release 2 marks an important milestone for
this open source project, in that it was fully transferred into a new
team and development paradigm. Adoption of GitHub as a code
development platform makes it easier for new parties to contribute and
get feedback on RDF converters, as well as make it possible to
automatically be added to the growing Bio2RDF network. Over the next
year we hope to offer bi-annual releases that adhere to formalized
development and release protocols.

Bio: Dr. Michel Dumontier is an Associate Professor of Bioinformatics
in the Department of Biology, the Institute of Biochemistry and School
of Computer Science at Carleton University in Ottawa, Canada. His
research aims to develop semantics-powered computational methods to
increase our understanding of how living systems respond to chemical
agents. At the core of the research program is the development and use
of Semantic Web technologies to formally represent and reason about
data and services so as (1) to facilitate the publishing, sharing and
discovery of scientific knowledge produced by individuals and small
collectives, (2) to enable the formulation and evaluation scientific
hypotheses using our collective tools and knowledge and (3) to create
and make available computational methods to investigate the structure,
function and behaviour of living systems. Dr. Dumontier serves as a
co-chair for the World Wide Web Consortium Semantic Web in Health Care
and Life Sciences Interest Group (W3C HCLSIG) and is the Scientific
Director for the open-source Bio2RDF linked data for life sciences
project.

---------------------------------------------------------------------------

You have been invited to a Fuze Meeting, hosted by Michel Dumontier:

-----------------------------------------
Meeting Subject: Bio2RDF Release 2
Meeting Date:    02/11/2013
Meeting Time:    11:00 AM US/Eastern
-----------------------------------------


To join the meeting from your computer or mobile device, click or copy
and paste this URL into your browser:
https://www.fuzemeeting.com/fuze/9f356e6f/18544023

To join the audio portion of this meeting, choose your dial in method:
Dial-in Number: +16465837415
Skype: fuzeaudioplus
International Numbers: https://www.fuzemeeting.com/numbers1

When prompted enter the pin number:
Attendee Pin Number: 47479369


Having trouble joining this meeting?
Click or copy and paste this URL into your browser to visit the Fuze
Support page:
http://www.fuzemeeting.com/support
Received on Saturday, 9 February 2013 12:30:36 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:01:18 GMT