W3C home > Mailing lists > Public > public-xg-rdb2rdf@w3.org > January 2009

Re: Deliverables from the RDB2RDF XG

From: Ivan Herman <ivan@w3.org>
Date: Mon, 26 Jan 2009 17:04:57 +0100
Message-ID: <497DDF29.6060506@w3.org>
To: ashok.malhotra@oracle.com
CC: Mauro Nunez <mauro@w3.org>, public-xg-rdb2rdf <public-xg-rdb2rdf@w3.org>
My apologies, I should have been more precise: the meeting is a telco!
Ie, the precise answer to your question is: on zakim:-)


ashok malhotra wrote:
> Hi Ivan:
> You said ...
> - Would you or one of your colleagues be ready to come (again:-) to one
> of our next SW Coordination Group meeting to give a bit of a report and
> discuss the possible followups?
> Where is the meeting?
> All the best, Ashok
> Ivan Herman wrote:
>> Hi Ashok,
>> First of all, thanks! I have actually two questions, none of those are
>> closely related to your original question (that Mauro already answered,
>> I believe:-)
>> - Would you or one of your colleagues be ready to come (again:-) to one
>> of our next SW Coordination Group meeting to give a bit of a report and
>> discuss the possible followups? The best date would be the 20th of
>> February, Friday, at 16:00 Amsterdam time (I guess 10:00 Boston time)?
>> - Did the group thought of also preparing a rough draft charter for the
>> group you propose? It would make things easier to discuss both
>> internally and externally. There can be many empty slots in the charter
>> but it would give an idea to move forward. It would also give a feeling
>> on who would/could staff such a group.
>> Thanks again!
>> Cheers
>> Ivan
>> ashok malhotra wrote:
>>> Ivan, Mauro:
>>> As you know, the RDB2RDF XG is coming to a close.  We are planning two
>>> deliverables and I thought I would run them by you for early comments.
>>> 1. We have prepared a final report.  This is attached.  I am trying to
>>> get permission to put it on the W3C site.
>>> 2. We have prepared a State Of the Art Survey.  This is in the form of
>>> extensions to the ESW  Wiki
>>> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt or as a PDF file
>>> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf
>>> <http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf>.
>>> both have the same content.  Is this format acceptable for an XG
>>> deliverable?
>>> ------------------------------------------------------------------------
>>> W3C <http://www.w3.org/>W3C Incubator Report
>>> <http://www.w3.org/2005/Incubator/XGR/>
>>>   W3C RDB2RDF Incubator Group Report
>>>     16 January 2009
>>> This version:
>>>     http://www.w3.org/XG_Report/2009/RDB2RDF_XG-20090116 Latest version:
>>>     http://www.w3.org/ XG_Report/RDB2RDF_XG
>>>     <http://www.w3.org/XG_Report/RDB2RDF_XG> Previous version:
>>>     This is the first public version. Author:
>>>     Ashok Malhotra (editor), Oracle
>>> Copyright © 2008 W3C <http://www.w3c.org>. All rights reserved. This
>>> document is available under the W3 C Document License
>>> <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231>.
>>> See the W 3C Intellectual Rights Notice and Legal Disclaimers
>>> <http://www.w3.org/Consortium/Legal/2002/ipr-notice-20021231#Copyright>
>>> for additional information.
>>> ------------------------------------------------------------------------
>>>     Abstract
>>> This is the final report from the RDB2RDF XG. The XG recommends that the
>>> W3C initiate a WG to standardize a language for mapping Relational
>>> Database schemas into RDF and OWL.
>>>     Status of this Document
>>> /This section describes the status of this document at the time of its
>>> publication. Other documents may supersede this document. A list of
>>> current W3C publications can be found in the W3C technical reports index
>>> <http://www.w3.org/TR/> at http://www.w3.org/TR/./
>>> This is the final recommendation from the RDB2RDF XG.
>>>     Table of Contents
>>> 1 Recommendation <#recommendation>
>>>     1.1 Usecases <#usecases>
>>>         1.1.1 Integrating Databases to Research Nicotine Dependency
>>> <#biomedical>
>>>         1.1.2 Triplify: Exposing Relational Data on the Web <#Triplify>
>>>         1.1.3 Integration of Enterprise Information Systems
>>> <#enterprise>
>>>         1.1.4 Ordnance Survey Use Case <#ordnance>
>>>     1.2 Liaisons <#liaisons>
>>>     1.3 Starting Points <#IDA2UIP>
>>> 2 References <#References>
>>> ------------------------------------------------------------------------
>>>     1 Recommendation
>>> The RDB2RDF XG recommends that the W3C initiate a Working Group (WG) to
>>> standardize a language for mapping Relational Database schemas into RDF
>>> and OWL. Such a standard will enable the vast amounts of data stored in
>>> Relational databases to be published easily and conveniently on the Web.
>>> It will also facilitate integrating data from separate Relational
>>> databases and adding semantics to Relational data.
>>> This recommendation is based on the a survey of the State Of the Art
>>> conducted by the XG [StateOfArt] <#StateOfArt> as well as the usecases
>>> discussed below.
>>> The mapping language defined by the WG would facilitate the development
>>> of several types of products. It could be used to translate Relational
>>> data into RDF which could be stored in a triple store. This is sometimes
>>> called Extract-Transform-Load (ETL). Or it could be used to generate a
>>> virtual mapping that could be queried using SPARQL and the SPARQL
>>> translated to SQL queries on the underlying Relational data. Other
>>> products could be layered on top of these capabilities to query and
>>> deliver data in different ways as well as to integrate the data with
>>> other kinds of information on the Semantic Web.
>>> The mapping language should be complete regarding when compared to to
>>> the relational algebra. It should have a human-readable syntax as well
>>> as XML and RDF representations of the syntax for purposes of discovery
>>> and machine generation.
>>> There is a strong suggestion that the mapping language be expressed in
>>> rules as defined by the W3C [RIF] <#RIF> WG. The syntax does not have to
>>> follow the RIF syntax but there should a round-trippable mapping between
>>> mapping language and a RIF dialect. The output of the mapping should be
>>> defined in terms of an RDFS/OWL schema.
>>> It should be possible to subset the language for simple applications
>>> such as Web 2.0. This feature of the language will be validated by
>>> creating a library of mappings for widely used apps such as Drupal,
>>> Wordpress, phpBB.
>>> The mapping language will allow customization with regard to names and
>>> data transformation. In addition, the language must be able to expose
>>> vendor specific SQL features such as full-text and spatial support and
>>> vendor-defined datatypes.
>>> The final language specification should include guidance with regard to
>>> mapping Relational data to a subset of OWL such as OWL/QL or OWL/RL.
>>> The language must allow for a mechanism to create identifiers for
>>> database entities. The generation of identifiers should be designed to
>>> support the implementation of the linked data principles [LinkedData]
>>> <#LinkedData>. Where possible, the language will encourage the reuse of
>>> public identifiers for long-lived entities such as persons,
>>> corporations, geo-locations, etc. See *1.2 Liaisons* <#liaisons>.
>>> The proposed Working Group will also create a set of test cases that
>>> could be used to verify conformance.
>>>       1.1 Usecases
>>> To bootstrap exploitation of the Web as a globally accessible linked
>>> database, we need a few essentials:
>>>     * Web accessible data needs to increase in granularity and cross
>>>       linkage.
>>>     * Web applications and solutions must produce structured interlinked
>>>       data as extensions of existing functionality.
>>>     * Web users must be shielded from the underlying complexity of
>>>       injecting structured linked data into the Web.
>>>         1.1.1 Integrating Databases to Research Nicotine Dependency
>>> Complex biological queries generally require the integration of
>>> information from several sources. To understand the genetic basis of
>>> nicotine dependence, gene and pathway information needed to be
>>> integrated and three complex biological queries answered using the
>>> integrated knowledge base. The gene information source NCBI Entrez Gene,
>>> which has gene-related records of ~2 million genes needed to be
>>> integrated with pathway information sources, such as KEGG (Kyoto
>>> Encyclopedia for Genes and Genomics). Comparing results across model
>>> organisms required homology information provided by the NCBI HomoloGene,
>>> containing homology data for several completely sequenced eukaryotic
>>> organisms).
>>> An ontology-driven approach was used to integrate the two gene resources
>>> (Entrez Gene and HomoloGene) and the three pathway resources (KEGG,
>>> Reactome and BioCyc). An OWL ontology called the Entrez Knowledge Model
>>> (EKoM) was created for the gene resources and integrated with the extant
>>> BioPAX ontology designed for pathway resources. The integrated schema
>>> was populated with data from the pathway resources, publicly available
>>> in BioPAX-compatible format, and gene resources for which a population
>>> procedure was created.
>>> SPARQL was used to formulate queries to investigate the genetic basis of
>>> nicotine dependence over the integrated knowledge base:
>>>     * Which genes participate in a large number of pathways?
>>>     * Identify "hub genes" from the perspective of gene interaction?
>>>     * Which genes are expressed in the brain, in the context of
>>>       neurobiology of nicotine dependence and various neurotransmitters
>>>       in the central nervous system?
>>> The result was very successful. The queries could easily identify hub
>>> genes, i.e., those genes whose gene products participate in many
>>> pathways or interact with many other gene products. See
>>> [NicotineDependence] <#> for details.
>>>         1.1.2 Triplify: Exposing Relational Data on the Web
>>> In order to make the Semantic Web useful to ordinary Web users, RDF and
>>> OWL have to be deployed on the Web on a much larger scale. Web
>>> applications such as Content Management Systems, online shops or
>>> community applications (e.g. Wikis, Blogs, Fora) already store their
>>> data in relational databases [Triplify] <#TriplifyPaper>. Providing a
>>> standardized way to map the relational data structures behind these Web
>>> applications into RDF, RDF-Schema and OWL will facilitate broad
>>> penetration and enrich the Web with RDF data and ontologies and
>>> facilitate novel semantic browsing and search applications.
>>> By supporting the long tail of Web applications and thus counteracting
>>> the centralization of the Web 2.0 applications the planned RDB2RDF
>>> standardization will help to give control over data back to end-users
>>> and thus promote a democratization of the Web.
>>> To support this usecase scenario, the mapping language should be easily
>>> implementable for lightweight Web applications and have a shallow
>>> learning curve to foster early adoption by Web developers.
>>>         1.1.3 Integration of Enterprise Information Systems
>>> Efficient information and data exchange between application systems
>>> within and across enterprises is of paramount importance in the
>>> increasingly networked and IT-dominated business atmosphere. Existing
>>> Enterprise Information Systems such as CRM, CMS and ERP systems use
>>> Relational database backends for persistence. RDF and Linked Data can
>>> provide data exchange and integration interfaces for such application
>>> systems, which are easy to implement and use, especially in settings
>>> where a loose and flexible coupling of the systems is required.
>>> Insight can often be gained by integrating data from databses built for
>>> different purposes in separate corporate silos. For example, integrating
>>> data from a bug database with a customer database may help understand
>>> ordering behavior as a function of the bugs encountered.
>>> In Supply Chain Management (SCM), for example, it is vital to exchange
>>> product catalogs and other goods related information within a network of
>>> interconnected businesses involved in the ultimate provision of product
>>> and service packages. Such information is stored in relational databases
>>> and sometimes already exchanged electronically, but a variety of
>>> different technologies are used (e.g. proprietary files, XML files, DB
>>> dumps, Web Services etc.). Realizing a completely electronic information
>>> flow requires significant initial investments and currently limits the
>>> flexibility of businesses (e.g. with regard to changes in business
>>> partners). The envisioned RDB2RDF mapping language applied in
>>> conjunction with existing RDB based SCM systems will support the use of
>>> RDF and unique identifiers for realizing flexible information
>>> information flows accompanying supply chains.
>>> The mapping language to be standardized by the proposed WG will simplify
>>> the publishing of enterprise data and information from Relational data
>>> backends and, thus, facilitate the interlinking and exchange of
>>> information between business information systems. In this scenario
>>> on-demand transformation of relational data to RDF, scalability and
>>> completeness with regard to the relational algebra are central
>>> requirements.
>>>         1.1.4 Ordnance Survey Use Case
>>> Ordnance Survey, the National mapping agency of the UK, operates a very
>>> large geographical information system based on Oracle Spatial. The
>>> database contains topographical features, soil type and land use
>>> information. All these types of information are independently maintained
>>> and use separate terminologies. They describe the same land area but the
>>> boundaries of objects utilized for representing land use and soil type
>>> and topography do not coincide: For example, a pasture might consist of
>>> two distinct types of soil.
>>> An example of a need to integrate this information is modeling
>>> filtration of pollutants into water bodies from agricultural land. The
>>> soil type determines the degree of filtration, the land use determines
>>> the type of pollutant. Topography determines whether the field is next
>>> to a water body.
>>> An ontology exists for describing the types of objects in each database.
>>> The benefit from mapping the data to RDF is in simplifying querying and
>>> integration of the data. The very high volume of data makes an ETL
>>> approach impracticable, besides, the Oracle Spatial database offers
>>> spatial joining which is generally not available on RDF stores.
>>> Thus, it is necessary to take SPARQL queries expressed in terms of the
>>> land use, soil type and topography ontologies and convert them into
>>> single SQL statements, with all joining and filtering to take place at
>>> the relational database. In the process, high level concepts need to be
>>> translated into SQL conditions on data that is not readily human
>>> readable.
>>> Business questions to be answered by the use case are for example:
>>>     * What is the total length of river bank bordered by permeable soil
>>>       used for grazing along a certain river?
>>>     * What types of crops are being cultivated within 100m of water,
>>>       with total land use grouped by crop.
>>>     * What watter bodies are subject to high environmental load from
>>>       agriculture, as defined by little current and extensive use of
>>>       adjacent land.
>>> From the viewpoint of RDB to RDF mapping, this usecase highlights the
>>> need to integrate data from different databases, built for different
>>> purposes. It also emphasizes need for extensibility in the mapping
>>> language for supporting RDBMS vendor specific features. In the present
>>> case, Oracle expresses a spatial join using a special type of derived
>>> table not found in standard SQL, thus the customization need is deeper
>>> than just supporting calls to native SQL functions.
>>> The inference requirement consists primarily of expanding class
>>> membership into and's and or's of conditions on the relational data. In
>>> some cases, these conditions are spatial, such as bordering on or
>>> contained in. The user should be familiar with the ontologies but should
>>> not have to know about the classification codes used in the databases.
>>>       1.2 Liaisons
>>> The WG must track the evolution of SPARQL and liaise with the DAWG WG as
>>> well as the OWL WG. The proposed WG will also keep track of work on
>>> assigning unique identifiers to well-known entities such as the ENS
>>> system associated with the OKKAM project [OKKAM] <#okkam> and the Common
>>> Naming Project started by Neuro Commons [Common Naming Project]
>>> <#CommonNaming>
>>>       1.3 Starting Points
>>> The WG will take as its starting point the mapping languages developed
>>> by the [D2RQ] <#D2RQ> and [Virtuoso] <#Virtuoso> efforts.
>>>     2 References
>>> Common Naming Project
>>>     Neuro Commons Common Naming Project
>>>     <http://neurocommons.org/page/Common_Naming_Project>, Science
>>>     Commons, Sept 17, 2008. (See
>>>     http://neurocommons.org/page/Common_Naming_Project.)
>>> D2RQ
>>>     The D2RQ Platform v0.5.1, User Manual and Language Specification
>>>     <http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/>, Chris Bizer,
>>>     Richard Cyganiak, Jorg Garbers, Oliver Maresch (See
>>>     http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/.)
>>> RIF
>>>     W3C Rule Interchange Format Working Group
>>>     <http://www.w3.org/2005/rules/wiki/RIF_Working_Group> (See
>>>     http://www.w3.org/2005/rules/wiki/RIF_Working_Group.)
>>> LinkedData
>>>     Design Issues for Linked Data
>>>     <http://www.w3.org/DesignIssues/LinkedData.html>, Tim Berners-Lee
>>>     (See http://www.w3.org/DesignIssues/LinkedData.html.)
>>> StateOfArt
>>>     Mapping Relational Data to RDF and OWL: A Literature Survey
>>>     <http://esw.w3.org/topic/Rdb2RdfXG/>, Satya Sahoo, Wolfgang Halb
>>>     (See http://esw.w3.org/topic/Rdb2RdfXG/.)
>>>     An Entity Name System (ENS) for the Semantic Web
>>>     <http://www.okkam.org/>, Paolo Bouquet, Heiko Stoermer, Barbara
>>>     Bazzanella, January 2008. (See http://www.okkam.org/.)
>>> Virtuoso
>>>     Virtuoso Open-Source Edition
>>>     <http://virtuoso.openlinksw.com/wiki/main/Main/> (See
>>>     http://virtuoso.openlinksw.com/wiki/main/Main/.)
>>> Triplify
>>>     Triplify - Lightweight Linked Data Publication from Relational
>>>     Databases, submitted to WWW 2009
>>> <http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf>Auer,
>>>     Dietzold, Lehmann, Hellmann, Aumueller (See
>>> http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf.)
>>> NicoteneDependence
>>>     An ontology-driven semantic mashup of gene and biological pathway
>>>     information: Application to the domain of nicotine dependence
>>>     <http://dx.doi.org/10.1016/j.jbi.2008.02.006 >Satya S. Sahoo,
>>>     Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner and Amit P.
>>>     Shetha (See http://dx.doi.org/10.1016/j.jbi.2008.02.006 .)


Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Monday, 26 January 2009 16:05:36 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:51:39 UTC