- From: Michael Hausenblas <michael.hausenblas@deri.org>
- Date: Tue, 13 Jan 2009 17:10:26 +0000
- To: <ashok.malhotra@oracle.com>
- CC: public-xg-rdb2rdf <public-xg-rdb2rdf@w3.org>
All, FYI: I've now put our XGR at the correct location [1]. Cheers, Michael [1] http://www.w3.org/2005/Incubator/rdb2rdf/XGR/ -- Dr. Michael Hausenblas DERI - Digital Enterprise Research Institute National University of Ireland, Lower Dangan, Galway, Ireland, Europe Tel. +353 91 495730 http://sw-app.org/about.html > From: ashok malhotra <ashok.malhotra@oracle.com> > Organization: Oracle > Reply-To: <ashok.malhotra@oracle.com> > Date: Mon, 12 Jan 2009 13:20:33 -0800 > To: public-xg-rdb2rdf <public-xg-rdb2rdf@w3.org> > Subject: Revised version of final XR Report > Resent-From: <public-xg-rdb2rdf@w3.org> > Resent-Date: Mon, 12 Jan 2009 21:22:10 +0000 > > See attached. > This has two additional usecases and some other changes as discussed on > last telcon. > > Thanks to Michael Hausenblas for cleaning up the XML source. > -- > All the best, Ashok > <http://www.w3.org/> <http://www.w3.org/2005/Incubator/XGR/> > W3C RDB2RDF Incubator Group Report > 12 January 2009 > This version: http://www.w3.org/XG_Report/2009/RDB2RDF_XG-20090112 Latest > version: http://www.w3.org/ XG_Report/RDB2RDF_XG > <http://www.w3.org/XG_Report/RDB2RDF_XG> Author:Ashok Malhotra (editor), > Oracle Copyright © 2008 W3C <http://www.w3c.org> . All rights reserved. This > document is available under the W3 C Document License > <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231> . See > the W 3C Intellectual Rights Notice and Legal Disclaimers > <http://www.w3.org/Consortium/Legal/2002/ipr-notice-20021231#Copyright> for > additional information. > > Abstract > This is the final report from the RDB2RDF XG. The XG recommends that the W3C > initiate a WG to standardize a language for mapping Relational Database > schemas into RDF and OWL. > > Status of this Document > This section describes the status of this document at the time of its > publication. Other documents may supersede this document. A list of current > W3C publications can be found in the W3C technical reports index > <http://www.w3.org/TR/> at http://www.w3.org/TR/. > > This is the final recommendation from the RDB2RDF XG. > > Table of Contents > 1 Recommendation <#recommendation> > 1.1 Usecases <#usecases> > 1.1.1 Integrating Databases to Research Nicotine Dependency > <#biomedical> > 1.1.2 Triplify: Exposing Relational Data on the Web <#triplify> > 1.1.3 Integration of Enterprise Information Systems <#enterprise> > 1.1.4 Ordnance Survey Use Case <#ordnance> > 1.2 Liaisons <#liaisons> > 1.3 Starting Points <#IDA5UIP> > 2 References <#References> > > 1 Recommendation > The RDB2RDF XG recommends that the W3C initiate a Working Group (WG) to > standardize a language for mapping Relational Database schemas into RDF and > OWL. Such a standard will enable the vast amounts of data stored in > Relational databases to be published easily and conveniently on the Web. It > will also facilitate integrating data from separate Relational databases and > adding semantics to Relational data. > > This recommendation is based on the a survey of the State Of the Art conducted > by the XG [StateOfArt] <#StateOfArt> as well as the usecases discussed below. > > The mapping language defined by the WG would facilitate the development of > several types of products. It could be used to translate Relational data into > RDF which could be stored in a triple store. This is sometimes called > Extract-Transform-Load (ETL). Or it could be used to generate a virtual > mapping that could be queried using SPARQL and the SPARQL translated to SQL > queries on the underlying Relational data. Other products could be layered on > top of these capabilities to query and deliver data in different ways as well > as to integrate the data with other kinds of information on the Semantic Web. > > The mapping language should be complete regarding when compared to to the > relational algebra. It should have a human-readable syntax as well as XML and > RDF representations of the syntax for purposes of discovery and machine > generation. > > There is a strong suggestion that the mapping language be expressed in rules > as defined by the W3C [RIF] <#RIF> WG. The syntax does not have to follow > the [RIF] <#RIF> syntax but should be isomorphic to it. The output of the > mapping should be defined in terms of an RDFS/OWL schema. > > It should be possible to subset the language for simple applications such as > Web 2.0. This feature of the language will be validated by creating a library > of mappings for widely used apps such as Drupal, Wordpress, phpBB. > > The mapping language will allow customization with regard to names and data > transformation. In addition, the language must be able to expose vendor > specific SQL features such as full-text and spatial support and vendor-defined > datatypes. > > The final language specification should include guidance with regard to > mapping Relational data to a subset of OWL such as OWL/QL or OWL/RL. > > The language must allow for a mechanism to create identifiers for database > entities. The generation of identifiers should be designed to support the > implementation of the linked data principles [LinkedData] <#LinkedData> . > Where possible, the language will encourage the reuse of public identifiers > for long-lived entities such as persons, corporations, geo-locations, etc. > See 1.2 Liaisons <#liaisons> . > > The proposed Working Group will also create a set of test cases that could be > used to verify conformance. > > 1.1 Usecases > To bootstrap exploitation of the Web as a globally accessible linked database, > we need a few essentials: > * Web accessible data needs to increase in granularity and cross linkage. > * Web applications and solutions must produce structured interlinked data as > extensions of existing functionality. > * Web users must be shielded from the underlying complexity of injecting > structured linked data into the Web. > 1.1.1 Integrating Databases to Research Nicotine Dependency > Complex biological queries generally require the integration of information > from several sources. To understand the genetic basis of nicotine dependence, > gene and pathway information needed to be integrated and three complex > biological queries answered using the integrated knowledge base. The gene > information source NCBI Entrez Gene, which has gene-related records of ~2 > million genes needed to be integrated with pathway information sources, such > as KEGG (Kyoto Encyclopedia for Genes and Genomics). Comparing results across > model organisms required homology information provided by the NCBI HomoloGene, > containing homology data for several completely sequenced eukaryotic > organisms). > > An ontology-driven approach was used to integrate the two gene resources > (Entrez Gene and HomoloGene) and the three pathway resources (KEGG, Reactome > and BioCyc). An OWL ontology called the Entrez Knowledge Model (EKoM) was > created for the gene resources and integrated with the extant BioPAX ontology > designed for pathway resources. The integrated schema was populated with data > from the pathway resources, publicly available in BioPAX-compatible format, > and gene resources for which a population procedure was created. > > SPARQL was used to formulate queries to investigate the genetic basis of > nicotine dependence over the integrated knowledge base: > * Which genes participate in a large number of pathways? > * Identify "hub genes" from the perspective of gene interaction? > * Which genes are expressed in the brain, in the context of neurobiology of > nicotine dependence and various neurotransmitters in the central nervous > system? > The result was very successful. The queries could easily identify hub genes, > i.e., those genes whose gene products participate in many pathways or interact > with many other gene products. See [NicotineDependence] <#> for details. > > 1.1.2 Triplify: Exposing Relational Data on the Web > In order to make the Semantic Web useful to ordinary Web users, RDF and OWL > have to be deployed on the Web on a much larger scale. Web applications such > as Content Management Systems, online shops or community applications (e.g. > Wikis, Blogs, Fora) already store their data in relational databases > [triplify] <#triplify> . Providing a standardized way to map the relational > data structures behind these Web applications into RDF, RDF-Schema and OWL > will facilitate broad penetration and enrich the Web with RDF data and > ontologies and facilitate novel semantic browsing and search applications. > > By supporting the long tail of Web applications and thus counteracting the > centralization of the Web 2.0 applications the planned RDB2RDF standardization > will help to give control over data back to end-users and thus promote a > democratization of the Web. > > To support this usecase scenario, the mapping language should be easily > implementable for lightweight Web applications and have a shallow learning > curve to foster early adoption by Web developers. > > 1.1.3 Integration of Enterprise Information Systems > Efficient information and data exchange between application systems within > and across enterprises is of paramount importance in the increasingly > networked and IT-dominated business atmosphere. Existing Enterprise > Information Systems such as CRM, CMS and ERP systems use Relational database > backends for persistence. RDF and Linked Data can provide data exchange and > integration interfaces for such application systems, which are easy to > implement and use, especially in settings where a loose and flexible coupling > of the systems is required. > > Insight can often be gained by integrating data from databses built for > different purposes in separate corporate silos. For example, integrating data > from a bug database with a customer database may help understand ordering > behavior as a function of the bugs encountered. > > In Supply Chain Management (SCM), for example, it is vital to exchange > product catalogs and other goods related information within a network of > interconnected businesses involved in the ultimate provision of product and > service packages. Such information is stored in relational databases and > sometimes already exchanged electronically, but a variety of different > technologies are used (e.g. proprietary files, XML files, DB dumps, Web > Services etc.). Realizing a completely electronic information flow requires > significant initial investments and currently limits the flexibility of > businesses (e.g. with regard to changes in business partners). The envisioned > RDB2RDF mapping language applied in conjunction with existing RDB based SCM > systems will support the use of RDF and unique identifiers for realizing > flexible information information flows accompanying supply chains. > > The mapping language to be standardized by the proposed WG will simplify the > publishing of enterprise data and information from Relational data backends > and, thus, facilitate the interlinking and exchange of information between > business information systems. In this scenario on-demand transformation of > relational data to RDF, scalability and completeness with regard to the > relational algebra are central requirements. > > 1.1.4 Ordnance Survey Use Case > Ordnance Survey, the National mapping agency of the UK, operates a very large > geographical information system based on Oracle Spatial. The database contains > topographical features, soil type and land use information. All these types > of information are independently maintained and use separate terminologies. > They describe the same land area but the boundaries of objects utilized for > representing land use and soil type and topography do not coincide: For > example, a pasture might consist of two distinct types of soil. > > An example of a need to integrate this information is modeling filtration of > pollutants into water bodies from agricultural land. The soil type determines > the degree of filtration, the land use determines the type of pollutant. > Topography determines whether the field is next to a water body. > > An ontology exists for describing the types of objects in each database. The > benefit from mapping the data to RDF is in simplifying querying and > integration of the data. The very high volume of data makes an ETL approach > impracticable, besides, the Oracle Spatial database offers spatial joining > which is generally not available on RDF stores. > > Thus, it is necessary to take SPARQL queries expressed in terms of the land > use, soil type and topography ontologies and convert them into single SQL > statements, with all joining and filtering to take place at the relational > database. In the process, high level concepts need to be translated into SQL > conditions on data that is not readily human readable. > > Business questions to be answered by the use case are for example: > * What is the total length of river bank bordered by permeable soil used for > grazing along a certain river? > * What types of crops are being cultivated within 100m of water, with total > land use grouped by crop. > * What watter bodies are subject to high environmental load from agriculture, > as defined by little current and extensive use of adjacent land. > From the viewpoint of RDB to RDF mapping, this usecase highlights the need to > integrate data from different databases, built for different purposes. It > also emphasizes need for extensibility in the mapping language for supporting > RDBMS vendor specific features. In the present case, Oracle expresses a > spatial join using a special type of derived table not found in standard SQL, > thus the customization need is deeper than just supporting calls to native SQL > functions. > > The inference requirement consists primarily of expanding class membership > into and's and or's of conditions on the relational data. In some cases, > these conditions are spatial, such as bordering on or contained in. The user > should be familiar with the ontologies but should not have to know about the > classification codes used in the databases. > > 1.2 Liaisons > The WG must track the evolution of SPARQL and liaise with the DAWG WG as well > as the OWL WG. The proposed WG will also keep track of work on assigning > unique identifiers to well-known entities such as the ENS system associated > with the OKKAM project [OKKAM] <#okkam> and the Common Naming Project started > by Neuro Commons [Common Naming Project] <#CommonNaming> > > 1.3 Starting Points > The WG will take as its starting point the mapping languages developed by the > [D2RQ] <#D2RQ> and [Virtuoso] <#Virtuoso> efforts. > > 2 References > Common Naming Project Neuro Commons Common Naming Project > <http://neurocommons.org/page/Common_Naming_Project> , Science Commons, Sept > 17, 2008. (See http://neurocommons.org/page/Common_Naming_Project.)D2RQ The > D2RQ Platform v0.5.1, User Manual and Language Specification > <http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/> , Chris Bizer, Richard > Cyganiak, Jorg Garbers, Oliver Maresch (See > http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/.)RIF W3C Rule Interchange > Format Working Group <http://www.w3.org/2005/rules/wiki/RIF_Working_Group> > (See http://www.w3.org/2005/rules/wiki/RIF_Working_Group.)LinkedData Design > Issues for Linked Data <http://www.w3.org/DesignIssues/LinkedData.html> , Tim > Berners-Lee (See http://www.w3.org/DesignIssues/LinkedData.html.)StateOfArt > Mapping Relational Data to RDF and OWL: A Literature Survey > <http://esw.w3.org/topic/Rdb2RdfXG/> , Satya Sahoo, Wolfgang Halb (See > http://esw.w3.org/topic/Rdb2RdfXG/.)OKKAM An Entity Name System (ENS) for the > Semantic Web <http://www.okkam.org/> , Paolo Bouquet, Heiko Stoermer, Barbara > Bazzanella, January 2008. (See http://www.okkam.org/.)Virtuoso Virtuoso > Open-Source Edition <http://virtuoso.openlinksw.com/wiki/main/Main/> (See > http://virtuoso.openlinksw.com/wiki/main/Main/.)Triplify Triplify - > Lightweight Linked Data Publication from Relational Databases, submitted to > WWW 2009 > <http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf> Auer, > Dietzold, Lehmann, Hellmann, Aumueller (See > http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf.)NicoteneD > ependence An ontology-driven semantic mashup of gene and biological pathway > information: Application to the domain of nicotine dependence > <http://dx.doi.org/10.1016/j.jbi.2008.02.006 > Satya S. Sahoo, Olivier > Bodenreider, Joni L. Rutter, Karen J. Skinner and Amit P. Shetha (See > http://dx.doi.org/10.1016/j.jbi.2008.02.006 .)
Received on Tuesday, 13 January 2009 17:11:09 UTC