- From: ashok malhotra <ashok.malhotra@oracle.com>
- Date: Mon, 26 Jan 2009 07:58:05 -0800
- To: Ivan Herman <ivan@w3.org>
- CC: Mauro Nunez <mauro@w3.org>, public-xg-rdb2rdf <public-xg-rdb2rdf@w3.org>
Hi Ivan: You said ... - Would you or one of your colleagues be ready to come (again:-) to one of our next SW Coordination Group meeting to give a bit of a report and discuss the possible followups? Where is the meeting? All the best, Ashok Ivan Herman wrote: > Hi Ashok, > > First of all, thanks! I have actually two questions, none of those are > closely related to your original question (that Mauro already answered, > I believe:-) > > - Would you or one of your colleagues be ready to come (again:-) to one > of our next SW Coordination Group meeting to give a bit of a report and > discuss the possible followups? The best date would be the 20th of > February, Friday, at 16:00 Amsterdam time (I guess 10:00 Boston time)? > > - Did the group thought of also preparing a rough draft charter for the > group you propose? It would make things easier to discuss both > internally and externally. There can be many empty slots in the charter > but it would give an idea to move forward. It would also give a feeling > on who would/could staff such a group. > > Thanks again! > > Cheers > > Ivan > > ashok malhotra wrote: > >> Ivan, Mauro: >> As you know, the RDB2RDF XG is coming to a close. We are planning two >> deliverables and I thought I would run them by you for early comments. >> >> 1. We have prepared a final report. This is attached. I am trying to >> get permission to put it on the W3C site. >> 2. We have prepared a State Of the Art Survey. This is in the form of >> extensions to the ESW Wiki >> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt or as a PDF file >> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf >> <http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf>. >> both have the same content. Is this format acceptable for an XG >> deliverable? >> >> ------------------------------------------------------------------------ >> >> W3C <http://www.w3.org/>W3C Incubator Report >> <http://www.w3.org/2005/Incubator/XGR/> >> >> >> W3C RDB2RDF Incubator Group Report >> >> >> 16 January 2009 >> >> This version: >> http://www.w3.org/XG_Report/2009/RDB2RDF_XG-20090116 >> Latest version: >> http://www.w3.org/ XG_Report/RDB2RDF_XG >> <http://www.w3.org/XG_Report/RDB2RDF_XG> >> Previous version: >> This is the first public version. >> Author: >> Ashok Malhotra (editor), Oracle >> >> Copyright © 2008 W3C <http://www.w3c.org>. All rights reserved. This >> document is available under the W3 C Document License >> <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231>. >> See the W 3C Intellectual Rights Notice and Legal Disclaimers >> <http://www.w3.org/Consortium/Legal/2002/ipr-notice-20021231#Copyright> >> for additional information. >> ------------------------------------------------------------------------ >> >> >> Abstract >> >> This is the final report from the RDB2RDF XG. The XG recommends that the >> W3C initiate a WG to standardize a language for mapping Relational >> Database schemas into RDF and OWL. >> >> >> Status of this Document >> >> /This section describes the status of this document at the time of its >> publication. Other documents may supersede this document. A list of >> current W3C publications can be found in the W3C technical reports index >> <http://www.w3.org/TR/> at http://www.w3.org/TR/./ >> >> This is the final recommendation from the RDB2RDF XG. >> >> >> Table of Contents >> >> 1 Recommendation <#recommendation> >> 1.1 Usecases <#usecases> >> 1.1.1 Integrating Databases to Research Nicotine Dependency >> <#biomedical> >> 1.1.2 Triplify: Exposing Relational Data on the Web <#Triplify> >> 1.1.3 Integration of Enterprise Information Systems <#enterprise> >> 1.1.4 Ordnance Survey Use Case <#ordnance> >> 1.2 Liaisons <#liaisons> >> 1.3 Starting Points <#IDA2UIP> >> 2 References <#References> >> >> ------------------------------------------------------------------------ >> >> >> 1 Recommendation >> >> The RDB2RDF XG recommends that the W3C initiate a Working Group (WG) to >> standardize a language for mapping Relational Database schemas into RDF >> and OWL. Such a standard will enable the vast amounts of data stored in >> Relational databases to be published easily and conveniently on the Web. >> It will also facilitate integrating data from separate Relational >> databases and adding semantics to Relational data. >> >> This recommendation is based on the a survey of the State Of the Art >> conducted by the XG [StateOfArt] <#StateOfArt> as well as the usecases >> discussed below. >> >> The mapping language defined by the WG would facilitate the development >> of several types of products. It could be used to translate Relational >> data into RDF which could be stored in a triple store. This is sometimes >> called Extract-Transform-Load (ETL). Or it could be used to generate a >> virtual mapping that could be queried using SPARQL and the SPARQL >> translated to SQL queries on the underlying Relational data. Other >> products could be layered on top of these capabilities to query and >> deliver data in different ways as well as to integrate the data with >> other kinds of information on the Semantic Web. >> >> The mapping language should be complete regarding when compared to to >> the relational algebra. It should have a human-readable syntax as well >> as XML and RDF representations of the syntax for purposes of discovery >> and machine generation. >> >> There is a strong suggestion that the mapping language be expressed in >> rules as defined by the W3C [RIF] <#RIF> WG. The syntax does not have to >> follow the RIF syntax but there should a round-trippable mapping between >> mapping language and a RIF dialect. The output of the mapping should be >> defined in terms of an RDFS/OWL schema. >> >> It should be possible to subset the language for simple applications >> such as Web 2.0. This feature of the language will be validated by >> creating a library of mappings for widely used apps such as Drupal, >> Wordpress, phpBB. >> >> The mapping language will allow customization with regard to names and >> data transformation. In addition, the language must be able to expose >> vendor specific SQL features such as full-text and spatial support and >> vendor-defined datatypes. >> >> The final language specification should include guidance with regard to >> mapping Relational data to a subset of OWL such as OWL/QL or OWL/RL. >> >> The language must allow for a mechanism to create identifiers for >> database entities. The generation of identifiers should be designed to >> support the implementation of the linked data principles [LinkedData] >> <#LinkedData>. Where possible, the language will encourage the reuse of >> public identifiers for long-lived entities such as persons, >> corporations, geo-locations, etc. See *1.2 Liaisons* <#liaisons>. >> >> The proposed Working Group will also create a set of test cases that >> could be used to verify conformance. >> >> >> 1.1 Usecases >> >> To bootstrap exploitation of the Web as a globally accessible linked >> database, we need a few essentials: >> >> * Web accessible data needs to increase in granularity and cross >> linkage. >> * Web applications and solutions must produce structured interlinked >> data as extensions of existing functionality. >> * Web users must be shielded from the underlying complexity of >> injecting structured linked data into the Web. >> >> >> 1.1.1 Integrating Databases to Research Nicotine Dependency >> >> Complex biological queries generally require the integration of >> information from several sources. To understand the genetic basis of >> nicotine dependence, gene and pathway information needed to be >> integrated and three complex biological queries answered using the >> integrated knowledge base. The gene information source NCBI Entrez Gene, >> which has gene-related records of ~2 million genes needed to be >> integrated with pathway information sources, such as KEGG (Kyoto >> Encyclopedia for Genes and Genomics). Comparing results across model >> organisms required homology information provided by the NCBI HomoloGene, >> containing homology data for several completely sequenced eukaryotic >> organisms). >> >> An ontology-driven approach was used to integrate the two gene resources >> (Entrez Gene and HomoloGene) and the three pathway resources (KEGG, >> Reactome and BioCyc). An OWL ontology called the Entrez Knowledge Model >> (EKoM) was created for the gene resources and integrated with the extant >> BioPAX ontology designed for pathway resources. The integrated schema >> was populated with data from the pathway resources, publicly available >> in BioPAX-compatible format, and gene resources for which a population >> procedure was created. >> >> SPARQL was used to formulate queries to investigate the genetic basis of >> nicotine dependence over the integrated knowledge base: >> >> * Which genes participate in a large number of pathways? >> * Identify "hub genes" from the perspective of gene interaction? >> * Which genes are expressed in the brain, in the context of >> neurobiology of nicotine dependence and various neurotransmitters >> in the central nervous system? >> >> The result was very successful. The queries could easily identify hub >> genes, i.e., those genes whose gene products participate in many >> pathways or interact with many other gene products. See >> [NicotineDependence] <#> for details. >> >> >> 1.1.2 Triplify: Exposing Relational Data on the Web >> >> In order to make the Semantic Web useful to ordinary Web users, RDF and >> OWL have to be deployed on the Web on a much larger scale. Web >> applications such as Content Management Systems, online shops or >> community applications (e.g. Wikis, Blogs, Fora) already store their >> data in relational databases [Triplify] <#TriplifyPaper>. Providing a >> standardized way to map the relational data structures behind these Web >> applications into RDF, RDF-Schema and OWL will facilitate broad >> penetration and enrich the Web with RDF data and ontologies and >> facilitate novel semantic browsing and search applications. >> >> By supporting the long tail of Web applications and thus counteracting >> the centralization of the Web 2.0 applications the planned RDB2RDF >> standardization will help to give control over data back to end-users >> and thus promote a democratization of the Web. >> >> To support this usecase scenario, the mapping language should be easily >> implementable for lightweight Web applications and have a shallow >> learning curve to foster early adoption by Web developers. >> >> >> 1.1.3 Integration of Enterprise Information Systems >> >> Efficient information and data exchange between application systems >> within and across enterprises is of paramount importance in the >> increasingly networked and IT-dominated business atmosphere. Existing >> Enterprise Information Systems such as CRM, CMS and ERP systems use >> Relational database backends for persistence. RDF and Linked Data can >> provide data exchange and integration interfaces for such application >> systems, which are easy to implement and use, especially in settings >> where a loose and flexible coupling of the systems is required. >> >> Insight can often be gained by integrating data from databses built for >> different purposes in separate corporate silos. For example, integrating >> data from a bug database with a customer database may help understand >> ordering behavior as a function of the bugs encountered. >> >> In Supply Chain Management (SCM), for example, it is vital to exchange >> product catalogs and other goods related information within a network of >> interconnected businesses involved in the ultimate provision of product >> and service packages. Such information is stored in relational databases >> and sometimes already exchanged electronically, but a variety of >> different technologies are used (e.g. proprietary files, XML files, DB >> dumps, Web Services etc.). Realizing a completely electronic information >> flow requires significant initial investments and currently limits the >> flexibility of businesses (e.g. with regard to changes in business >> partners). The envisioned RDB2RDF mapping language applied in >> conjunction with existing RDB based SCM systems will support the use of >> RDF and unique identifiers for realizing flexible information >> information flows accompanying supply chains. >> >> The mapping language to be standardized by the proposed WG will simplify >> the publishing of enterprise data and information from Relational data >> backends and, thus, facilitate the interlinking and exchange of >> information between business information systems. In this scenario >> on-demand transformation of relational data to RDF, scalability and >> completeness with regard to the relational algebra are central >> requirements. >> >> >> 1.1.4 Ordnance Survey Use Case >> >> Ordnance Survey, the National mapping agency of the UK, operates a very >> large geographical information system based on Oracle Spatial. The >> database contains topographical features, soil type and land use >> information. All these types of information are independently maintained >> and use separate terminologies. They describe the same land area but the >> boundaries of objects utilized for representing land use and soil type >> and topography do not coincide: For example, a pasture might consist of >> two distinct types of soil. >> >> An example of a need to integrate this information is modeling >> filtration of pollutants into water bodies from agricultural land. The >> soil type determines the degree of filtration, the land use determines >> the type of pollutant. Topography determines whether the field is next >> to a water body. >> >> An ontology exists for describing the types of objects in each database. >> The benefit from mapping the data to RDF is in simplifying querying and >> integration of the data. The very high volume of data makes an ETL >> approach impracticable, besides, the Oracle Spatial database offers >> spatial joining which is generally not available on RDF stores. >> >> Thus, it is necessary to take SPARQL queries expressed in terms of the >> land use, soil type and topography ontologies and convert them into >> single SQL statements, with all joining and filtering to take place at >> the relational database. In the process, high level concepts need to be >> translated into SQL conditions on data that is not readily human readable. >> >> Business questions to be answered by the use case are for example: >> >> * What is the total length of river bank bordered by permeable soil >> used for grazing along a certain river? >> * What types of crops are being cultivated within 100m of water, >> with total land use grouped by crop. >> * What watter bodies are subject to high environmental load from >> agriculture, as defined by little current and extensive use of >> adjacent land. >> >> From the viewpoint of RDB to RDF mapping, this usecase highlights the >> need to integrate data from different databases, built for different >> purposes. It also emphasizes need for extensibility in the mapping >> language for supporting RDBMS vendor specific features. In the present >> case, Oracle expresses a spatial join using a special type of derived >> table not found in standard SQL, thus the customization need is deeper >> than just supporting calls to native SQL functions. >> >> The inference requirement consists primarily of expanding class >> membership into and's and or's of conditions on the relational data. In >> some cases, these conditions are spatial, such as bordering on or >> contained in. The user should be familiar with the ontologies but should >> not have to know about the classification codes used in the databases. >> >> >> 1.2 Liaisons >> >> The WG must track the evolution of SPARQL and liaise with the DAWG WG as >> well as the OWL WG. The proposed WG will also keep track of work on >> assigning unique identifiers to well-known entities such as the ENS >> system associated with the OKKAM project [OKKAM] <#okkam> and the Common >> Naming Project started by Neuro Commons [Common Naming Project] >> <#CommonNaming> >> >> >> 1.3 Starting Points >> >> The WG will take as its starting point the mapping languages developed >> by the [D2RQ] <#D2RQ> and [Virtuoso] <#Virtuoso> efforts. >> >> >> 2 References >> >> Common Naming Project >> Neuro Commons Common Naming Project >> <http://neurocommons.org/page/Common_Naming_Project>, Science >> Commons, Sept 17, 2008. (See >> http://neurocommons.org/page/Common_Naming_Project.) >> D2RQ >> The D2RQ Platform v0.5.1, User Manual and Language Specification >> <http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/>, Chris Bizer, >> Richard Cyganiak, Jorg Garbers, Oliver Maresch (See >> http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/.) >> RIF >> W3C Rule Interchange Format Working Group >> <http://www.w3.org/2005/rules/wiki/RIF_Working_Group> (See >> http://www.w3.org/2005/rules/wiki/RIF_Working_Group.) >> LinkedData >> Design Issues for Linked Data >> <http://www.w3.org/DesignIssues/LinkedData.html>, Tim Berners-Lee >> (See http://www.w3.org/DesignIssues/LinkedData.html.) >> StateOfArt >> Mapping Relational Data to RDF and OWL: A Literature Survey >> <http://esw.w3.org/topic/Rdb2RdfXG/>, Satya Sahoo, Wolfgang Halb >> (See http://esw.w3.org/topic/Rdb2RdfXG/.) >> OKKAM >> An Entity Name System (ENS) for the Semantic Web >> <http://www.okkam.org/>, Paolo Bouquet, Heiko Stoermer, Barbara >> Bazzanella, January 2008. (See http://www.okkam.org/.) >> Virtuoso >> Virtuoso Open-Source Edition >> <http://virtuoso.openlinksw.com/wiki/main/Main/> (See >> http://virtuoso.openlinksw.com/wiki/main/Main/.) >> Triplify >> Triplify - Lightweight Linked Data Publication from Relational >> Databases, submitted to WWW 2009 >> <http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf>Auer, >> Dietzold, Lehmann, Hellmann, Aumueller (See >> http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf.) >> NicoteneDependence >> An ontology-driven semantic mashup of gene and biological pathway >> information: Application to the domain of nicotine dependence >> <http://dx.doi.org/10.1016/j.jbi.2008.02.006 >Satya S. Sahoo, >> Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner and Amit P. >> Shetha (See http://dx.doi.org/10.1016/j.jbi.2008.02.006 .) >> > >
Received on Monday, 26 January 2009 15:59:46 UTC