- From: Ivan Herman <ivan@w3.org>
- Date: Mon, 26 Jan 2009 17:04:57 +0100
- To: ashok.malhotra@oracle.com
- CC: Mauro Nunez <mauro@w3.org>, public-xg-rdb2rdf <public-xg-rdb2rdf@w3.org>
- Message-ID: <497DDF29.6060506@w3.org>
My apologies, I should have been more precise: the meeting is a telco! Ie, the precise answer to your question is: on zakim:-) Ivan ashok malhotra wrote: > Hi Ivan: > You said ... > > - Would you or one of your colleagues be ready to come (again:-) to one > of our next SW Coordination Group meeting to give a bit of a report and > discuss the possible followups? > Where is the meeting? > > All the best, Ashok > > Ivan Herman wrote: >> Hi Ashok, >> >> First of all, thanks! I have actually two questions, none of those are >> closely related to your original question (that Mauro already answered, >> I believe:-) >> >> - Would you or one of your colleagues be ready to come (again:-) to one >> of our next SW Coordination Group meeting to give a bit of a report and >> discuss the possible followups? The best date would be the 20th of >> February, Friday, at 16:00 Amsterdam time (I guess 10:00 Boston time)? >> >> - Did the group thought of also preparing a rough draft charter for the >> group you propose? It would make things easier to discuss both >> internally and externally. There can be many empty slots in the charter >> but it would give an idea to move forward. It would also give a feeling >> on who would/could staff such a group. >> >> Thanks again! >> >> Cheers >> >> Ivan >> >> ashok malhotra wrote: >> >>> Ivan, Mauro: >>> As you know, the RDB2RDF XG is coming to a close. We are planning two >>> deliverables and I thought I would run them by you for early comments. >>> >>> 1. We have prepared a final report. This is attached. I am trying to >>> get permission to put it on the W3C site. >>> 2. We have prepared a State Of the Art Survey. This is in the form of >>> extensions to the ESW Wiki >>> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt or as a PDF file >>> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf >>> >>> <http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf>. >>> >>> both have the same content. Is this format acceptable for an XG >>> deliverable? >>> >>> ------------------------------------------------------------------------ >>> >>> W3C <http://www.w3.org/>W3C Incubator Report >>> <http://www.w3.org/2005/Incubator/XGR/> >>> >>> >>> W3C RDB2RDF Incubator Group Report >>> >>> >>> 16 January 2009 >>> >>> This version: >>> http://www.w3.org/XG_Report/2009/RDB2RDF_XG-20090116 Latest version: >>> http://www.w3.org/ XG_Report/RDB2RDF_XG >>> <http://www.w3.org/XG_Report/RDB2RDF_XG> Previous version: >>> This is the first public version. Author: >>> Ashok Malhotra (editor), Oracle >>> >>> Copyright © 2008 W3C <http://www.w3c.org>. All rights reserved. This >>> document is available under the W3 C Document License >>> <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231>. >>> See the W 3C Intellectual Rights Notice and Legal Disclaimers >>> <http://www.w3.org/Consortium/Legal/2002/ipr-notice-20021231#Copyright> >>> for additional information. >>> ------------------------------------------------------------------------ >>> >>> >>> Abstract >>> >>> This is the final report from the RDB2RDF XG. The XG recommends that the >>> W3C initiate a WG to standardize a language for mapping Relational >>> Database schemas into RDF and OWL. >>> >>> >>> Status of this Document >>> >>> /This section describes the status of this document at the time of its >>> publication. Other documents may supersede this document. A list of >>> current W3C publications can be found in the W3C technical reports index >>> <http://www.w3.org/TR/> at http://www.w3.org/TR/./ >>> >>> This is the final recommendation from the RDB2RDF XG. >>> >>> >>> Table of Contents >>> >>> 1 Recommendation <#recommendation> >>> 1.1 Usecases <#usecases> >>> 1.1.1 Integrating Databases to Research Nicotine Dependency >>> <#biomedical> >>> 1.1.2 Triplify: Exposing Relational Data on the Web <#Triplify> >>> 1.1.3 Integration of Enterprise Information Systems >>> <#enterprise> >>> 1.1.4 Ordnance Survey Use Case <#ordnance> >>> 1.2 Liaisons <#liaisons> >>> 1.3 Starting Points <#IDA2UIP> >>> 2 References <#References> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> 1 Recommendation >>> >>> The RDB2RDF XG recommends that the W3C initiate a Working Group (WG) to >>> standardize a language for mapping Relational Database schemas into RDF >>> and OWL. Such a standard will enable the vast amounts of data stored in >>> Relational databases to be published easily and conveniently on the Web. >>> It will also facilitate integrating data from separate Relational >>> databases and adding semantics to Relational data. >>> >>> This recommendation is based on the a survey of the State Of the Art >>> conducted by the XG [StateOfArt] <#StateOfArt> as well as the usecases >>> discussed below. >>> >>> The mapping language defined by the WG would facilitate the development >>> of several types of products. It could be used to translate Relational >>> data into RDF which could be stored in a triple store. This is sometimes >>> called Extract-Transform-Load (ETL). Or it could be used to generate a >>> virtual mapping that could be queried using SPARQL and the SPARQL >>> translated to SQL queries on the underlying Relational data. Other >>> products could be layered on top of these capabilities to query and >>> deliver data in different ways as well as to integrate the data with >>> other kinds of information on the Semantic Web. >>> >>> The mapping language should be complete regarding when compared to to >>> the relational algebra. It should have a human-readable syntax as well >>> as XML and RDF representations of the syntax for purposes of discovery >>> and machine generation. >>> >>> There is a strong suggestion that the mapping language be expressed in >>> rules as defined by the W3C [RIF] <#RIF> WG. The syntax does not have to >>> follow the RIF syntax but there should a round-trippable mapping between >>> mapping language and a RIF dialect. The output of the mapping should be >>> defined in terms of an RDFS/OWL schema. >>> >>> It should be possible to subset the language for simple applications >>> such as Web 2.0. This feature of the language will be validated by >>> creating a library of mappings for widely used apps such as Drupal, >>> Wordpress, phpBB. >>> >>> The mapping language will allow customization with regard to names and >>> data transformation. In addition, the language must be able to expose >>> vendor specific SQL features such as full-text and spatial support and >>> vendor-defined datatypes. >>> >>> The final language specification should include guidance with regard to >>> mapping Relational data to a subset of OWL such as OWL/QL or OWL/RL. >>> >>> The language must allow for a mechanism to create identifiers for >>> database entities. The generation of identifiers should be designed to >>> support the implementation of the linked data principles [LinkedData] >>> <#LinkedData>. Where possible, the language will encourage the reuse of >>> public identifiers for long-lived entities such as persons, >>> corporations, geo-locations, etc. See *1.2 Liaisons* <#liaisons>. >>> >>> The proposed Working Group will also create a set of test cases that >>> could be used to verify conformance. >>> >>> >>> 1.1 Usecases >>> >>> To bootstrap exploitation of the Web as a globally accessible linked >>> database, we need a few essentials: >>> >>> * Web accessible data needs to increase in granularity and cross >>> linkage. >>> * Web applications and solutions must produce structured interlinked >>> data as extensions of existing functionality. >>> * Web users must be shielded from the underlying complexity of >>> injecting structured linked data into the Web. >>> >>> >>> 1.1.1 Integrating Databases to Research Nicotine Dependency >>> >>> Complex biological queries generally require the integration of >>> information from several sources. To understand the genetic basis of >>> nicotine dependence, gene and pathway information needed to be >>> integrated and three complex biological queries answered using the >>> integrated knowledge base. The gene information source NCBI Entrez Gene, >>> which has gene-related records of ~2 million genes needed to be >>> integrated with pathway information sources, such as KEGG (Kyoto >>> Encyclopedia for Genes and Genomics). Comparing results across model >>> organisms required homology information provided by the NCBI HomoloGene, >>> containing homology data for several completely sequenced eukaryotic >>> organisms). >>> >>> An ontology-driven approach was used to integrate the two gene resources >>> (Entrez Gene and HomoloGene) and the three pathway resources (KEGG, >>> Reactome and BioCyc). An OWL ontology called the Entrez Knowledge Model >>> (EKoM) was created for the gene resources and integrated with the extant >>> BioPAX ontology designed for pathway resources. The integrated schema >>> was populated with data from the pathway resources, publicly available >>> in BioPAX-compatible format, and gene resources for which a population >>> procedure was created. >>> >>> SPARQL was used to formulate queries to investigate the genetic basis of >>> nicotine dependence over the integrated knowledge base: >>> >>> * Which genes participate in a large number of pathways? >>> * Identify "hub genes" from the perspective of gene interaction? >>> * Which genes are expressed in the brain, in the context of >>> neurobiology of nicotine dependence and various neurotransmitters >>> in the central nervous system? >>> >>> The result was very successful. The queries could easily identify hub >>> genes, i.e., those genes whose gene products participate in many >>> pathways or interact with many other gene products. See >>> [NicotineDependence] <#> for details. >>> >>> >>> 1.1.2 Triplify: Exposing Relational Data on the Web >>> >>> In order to make the Semantic Web useful to ordinary Web users, RDF and >>> OWL have to be deployed on the Web on a much larger scale. Web >>> applications such as Content Management Systems, online shops or >>> community applications (e.g. Wikis, Blogs, Fora) already store their >>> data in relational databases [Triplify] <#TriplifyPaper>. Providing a >>> standardized way to map the relational data structures behind these Web >>> applications into RDF, RDF-Schema and OWL will facilitate broad >>> penetration and enrich the Web with RDF data and ontologies and >>> facilitate novel semantic browsing and search applications. >>> >>> By supporting the long tail of Web applications and thus counteracting >>> the centralization of the Web 2.0 applications the planned RDB2RDF >>> standardization will help to give control over data back to end-users >>> and thus promote a democratization of the Web. >>> >>> To support this usecase scenario, the mapping language should be easily >>> implementable for lightweight Web applications and have a shallow >>> learning curve to foster early adoption by Web developers. >>> >>> >>> 1.1.3 Integration of Enterprise Information Systems >>> >>> Efficient information and data exchange between application systems >>> within and across enterprises is of paramount importance in the >>> increasingly networked and IT-dominated business atmosphere. Existing >>> Enterprise Information Systems such as CRM, CMS and ERP systems use >>> Relational database backends for persistence. RDF and Linked Data can >>> provide data exchange and integration interfaces for such application >>> systems, which are easy to implement and use, especially in settings >>> where a loose and flexible coupling of the systems is required. >>> >>> Insight can often be gained by integrating data from databses built for >>> different purposes in separate corporate silos. For example, integrating >>> data from a bug database with a customer database may help understand >>> ordering behavior as a function of the bugs encountered. >>> >>> In Supply Chain Management (SCM), for example, it is vital to exchange >>> product catalogs and other goods related information within a network of >>> interconnected businesses involved in the ultimate provision of product >>> and service packages. Such information is stored in relational databases >>> and sometimes already exchanged electronically, but a variety of >>> different technologies are used (e.g. proprietary files, XML files, DB >>> dumps, Web Services etc.). Realizing a completely electronic information >>> flow requires significant initial investments and currently limits the >>> flexibility of businesses (e.g. with regard to changes in business >>> partners). The envisioned RDB2RDF mapping language applied in >>> conjunction with existing RDB based SCM systems will support the use of >>> RDF and unique identifiers for realizing flexible information >>> information flows accompanying supply chains. >>> >>> The mapping language to be standardized by the proposed WG will simplify >>> the publishing of enterprise data and information from Relational data >>> backends and, thus, facilitate the interlinking and exchange of >>> information between business information systems. In this scenario >>> on-demand transformation of relational data to RDF, scalability and >>> completeness with regard to the relational algebra are central >>> requirements. >>> >>> >>> 1.1.4 Ordnance Survey Use Case >>> >>> Ordnance Survey, the National mapping agency of the UK, operates a very >>> large geographical information system based on Oracle Spatial. The >>> database contains topographical features, soil type and land use >>> information. All these types of information are independently maintained >>> and use separate terminologies. They describe the same land area but the >>> boundaries of objects utilized for representing land use and soil type >>> and topography do not coincide: For example, a pasture might consist of >>> two distinct types of soil. >>> >>> An example of a need to integrate this information is modeling >>> filtration of pollutants into water bodies from agricultural land. The >>> soil type determines the degree of filtration, the land use determines >>> the type of pollutant. Topography determines whether the field is next >>> to a water body. >>> >>> An ontology exists for describing the types of objects in each database. >>> The benefit from mapping the data to RDF is in simplifying querying and >>> integration of the data. The very high volume of data makes an ETL >>> approach impracticable, besides, the Oracle Spatial database offers >>> spatial joining which is generally not available on RDF stores. >>> >>> Thus, it is necessary to take SPARQL queries expressed in terms of the >>> land use, soil type and topography ontologies and convert them into >>> single SQL statements, with all joining and filtering to take place at >>> the relational database. In the process, high level concepts need to be >>> translated into SQL conditions on data that is not readily human >>> readable. >>> >>> Business questions to be answered by the use case are for example: >>> >>> * What is the total length of river bank bordered by permeable soil >>> used for grazing along a certain river? >>> * What types of crops are being cultivated within 100m of water, >>> with total land use grouped by crop. >>> * What watter bodies are subject to high environmental load from >>> agriculture, as defined by little current and extensive use of >>> adjacent land. >>> >>> From the viewpoint of RDB to RDF mapping, this usecase highlights the >>> need to integrate data from different databases, built for different >>> purposes. It also emphasizes need for extensibility in the mapping >>> language for supporting RDBMS vendor specific features. In the present >>> case, Oracle expresses a spatial join using a special type of derived >>> table not found in standard SQL, thus the customization need is deeper >>> than just supporting calls to native SQL functions. >>> >>> The inference requirement consists primarily of expanding class >>> membership into and's and or's of conditions on the relational data. In >>> some cases, these conditions are spatial, such as bordering on or >>> contained in. The user should be familiar with the ontologies but should >>> not have to know about the classification codes used in the databases. >>> >>> >>> 1.2 Liaisons >>> >>> The WG must track the evolution of SPARQL and liaise with the DAWG WG as >>> well as the OWL WG. The proposed WG will also keep track of work on >>> assigning unique identifiers to well-known entities such as the ENS >>> system associated with the OKKAM project [OKKAM] <#okkam> and the Common >>> Naming Project started by Neuro Commons [Common Naming Project] >>> <#CommonNaming> >>> >>> >>> 1.3 Starting Points >>> >>> The WG will take as its starting point the mapping languages developed >>> by the [D2RQ] <#D2RQ> and [Virtuoso] <#Virtuoso> efforts. >>> >>> >>> 2 References >>> >>> Common Naming Project >>> Neuro Commons Common Naming Project >>> <http://neurocommons.org/page/Common_Naming_Project>, Science >>> Commons, Sept 17, 2008. (See >>> http://neurocommons.org/page/Common_Naming_Project.) >>> D2RQ >>> The D2RQ Platform v0.5.1, User Manual and Language Specification >>> <http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/>, Chris Bizer, >>> Richard Cyganiak, Jorg Garbers, Oliver Maresch (See >>> http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/.) >>> RIF >>> W3C Rule Interchange Format Working Group >>> <http://www.w3.org/2005/rules/wiki/RIF_Working_Group> (See >>> http://www.w3.org/2005/rules/wiki/RIF_Working_Group.) >>> LinkedData >>> Design Issues for Linked Data >>> <http://www.w3.org/DesignIssues/LinkedData.html>, Tim Berners-Lee >>> (See http://www.w3.org/DesignIssues/LinkedData.html.) >>> StateOfArt >>> Mapping Relational Data to RDF and OWL: A Literature Survey >>> <http://esw.w3.org/topic/Rdb2RdfXG/>, Satya Sahoo, Wolfgang Halb >>> (See http://esw.w3.org/topic/Rdb2RdfXG/.) >>> OKKAM >>> An Entity Name System (ENS) for the Semantic Web >>> <http://www.okkam.org/>, Paolo Bouquet, Heiko Stoermer, Barbara >>> Bazzanella, January 2008. (See http://www.okkam.org/.) >>> Virtuoso >>> Virtuoso Open-Source Edition >>> <http://virtuoso.openlinksw.com/wiki/main/Main/> (See >>> http://virtuoso.openlinksw.com/wiki/main/Main/.) >>> Triplify >>> Triplify - Lightweight Linked Data Publication from Relational >>> Databases, submitted to WWW 2009 >>> >>> <http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf>Auer, >>> >>> Dietzold, Lehmann, Hellmann, Aumueller (See >>> >>> http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf.) >>> NicoteneDependence >>> An ontology-driven semantic mashup of gene and biological pathway >>> information: Application to the domain of nicotine dependence >>> <http://dx.doi.org/10.1016/j.jbi.2008.02.006 >Satya S. Sahoo, >>> Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner and Amit P. >>> Shetha (See http://dx.doi.org/10.1016/j.jbi.2008.02.006 .) >>> >> >> -- Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 PGP Key: http://www.ivan-herman.net/pgpkey.html FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Monday, 26 January 2009 16:05:36 UTC