- From: ashok malhotra <ashok.malhotra@oracle.com>
- Date: Mon, 26 Jan 2009 08:26:36 -0800
- To: Ivan Herman <ivan@w3.org>
- CC: Mauro Nunez <mauro@w3.org>, public-xg-rdb2rdf <public-xg-rdb2rdf@w3.org>
OK. I will ask who wants to dial in. We can use the member-xg-rdb2rdf list for the details. All the best, Ashok Ivan Herman wrote: > Ashok, > > I would prefer not to copy the final setup and possible followup > discussions to this list simply because the other (coordination group) > mailing list is member confidential. Mixing two lists with different > confidentiality level is a recipe for something going wrong:-) I hope > that is all right. > > The meeting is still fairly far away, ie, we have time; would it be > possible to tell me who would dial in besides you in a few weeks? I > would then contact them personally. > > Thanks a lot again! > > Ivan > > ashok malhotra wrote: > >> I'll be happy to dial in. Others from the XG may want to dial in as well. >> I will put the final report on the W3C site and send out the pointer >> when it is done. >> >> Please send details of the telcon to this list. >> >> All the best, Ashok >> >> >> Ivan Herman wrote: >> >>> My apologies, I should have been more precise: the meeting is a telco! >>> Ie, the precise answer to your question is: on zakim:-) >>> >>> Ivan >>> >>> ashok malhotra wrote: >>> >>> >>>> Hi Ivan: >>>> You said ... >>>> >>>> - Would you or one of your colleagues be ready to come (again:-) to one >>>> of our next SW Coordination Group meeting to give a bit of a report and >>>> discuss the possible followups? >>>> Where is the meeting? >>>> >>>> All the best, Ashok >>>> >>>> Ivan Herman wrote: >>>> >>>> >>>>> Hi Ashok, >>>>> >>>>> First of all, thanks! I have actually two questions, none of those are >>>>> closely related to your original question (that Mauro already answered, >>>>> I believe:-) >>>>> >>>>> - Would you or one of your colleagues be ready to come (again:-) to one >>>>> of our next SW Coordination Group meeting to give a bit of a report and >>>>> discuss the possible followups? The best date would be the 20th of >>>>> February, Friday, at 16:00 Amsterdam time (I guess 10:00 Boston time)? >>>>> >>>>> - Did the group thought of also preparing a rough draft charter for the >>>>> group you propose? It would make things easier to discuss both >>>>> internally and externally. There can be many empty slots in the charter >>>>> but it would give an idea to move forward. It would also give a feeling >>>>> on who would/could staff such a group. >>>>> >>>>> Thanks again! >>>>> >>>>> Cheers >>>>> >>>>> Ivan >>>>> >>>>> ashok malhotra wrote: >>>>> >>>>> >>>>> >>>>>> Ivan, Mauro: >>>>>> As you know, the RDB2RDF XG is coming to a close. We are planning two >>>>>> deliverables and I thought I would run them by you for early comments. >>>>>> >>>>>> 1. We have prepared a final report. This is attached. I am trying to >>>>>> get permission to put it on the W3C site. >>>>>> 2. We have prepared a State Of the Art Survey. This is in the form of >>>>>> extensions to the ESW Wiki >>>>>> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt or as a PDF file >>>>>> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf >>>>>> >>>>>> >>>>>> <http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf>. >>>>>> >>>>>> >>>>>> both have the same content. Is this format acceptable for an XG >>>>>> deliverable? >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> W3C <http://www.w3.org/>W3C Incubator Report >>>>>> <http://www.w3.org/2005/Incubator/XGR/> >>>>>> >>>>>> >>>>>> W3C RDB2RDF Incubator Group Report >>>>>> >>>>>> >>>>>> 16 January 2009 >>>>>> >>>>>> This version: >>>>>> http://www.w3.org/XG_Report/2009/RDB2RDF_XG-20090116 Latest >>>>>> version: >>>>>> http://www.w3.org/ XG_Report/RDB2RDF_XG >>>>>> <http://www.w3.org/XG_Report/RDB2RDF_XG> Previous version: >>>>>> This is the first public version. Author: >>>>>> Ashok Malhotra (editor), Oracle >>>>>> >>>>>> Copyright © 2008 W3C <http://www.w3c.org>. All rights reserved. This >>>>>> document is available under the W3 C Document License >>>>>> <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231>. >>>>>> >>>>>> See the W 3C Intellectual Rights Notice and Legal Disclaimers >>>>>> <http://www.w3.org/Consortium/Legal/2002/ipr-notice-20021231#Copyright> >>>>>> >>>>>> for additional information. >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> >>>>>> Abstract >>>>>> >>>>>> This is the final report from the RDB2RDF XG. The XG recommends >>>>>> that the >>>>>> W3C initiate a WG to standardize a language for mapping Relational >>>>>> Database schemas into RDF and OWL. >>>>>> >>>>>> >>>>>> Status of this Document >>>>>> >>>>>> /This section describes the status of this document at the time of its >>>>>> publication. Other documents may supersede this document. A list of >>>>>> current W3C publications can be found in the W3C technical reports >>>>>> index >>>>>> <http://www.w3.org/TR/> at http://www.w3.org/TR/./ >>>>>> >>>>>> This is the final recommendation from the RDB2RDF XG. >>>>>> >>>>>> >>>>>> Table of Contents >>>>>> >>>>>> 1 Recommendation <#recommendation> >>>>>> 1.1 Usecases <#usecases> >>>>>> 1.1.1 Integrating Databases to Research Nicotine Dependency >>>>>> <#biomedical> >>>>>> 1.1.2 Triplify: Exposing Relational Data on the Web >>>>>> <#Triplify> >>>>>> 1.1.3 Integration of Enterprise Information Systems >>>>>> <#enterprise> >>>>>> 1.1.4 Ordnance Survey Use Case <#ordnance> >>>>>> 1.2 Liaisons <#liaisons> >>>>>> 1.3 Starting Points <#IDA2UIP> >>>>>> 2 References <#References> >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> >>>>>> 1 Recommendation >>>>>> >>>>>> The RDB2RDF XG recommends that the W3C initiate a Working Group >>>>>> (WG) to >>>>>> standardize a language for mapping Relational Database schemas into >>>>>> RDF >>>>>> and OWL. Such a standard will enable the vast amounts of data >>>>>> stored in >>>>>> Relational databases to be published easily and conveniently on the >>>>>> Web. >>>>>> It will also facilitate integrating data from separate Relational >>>>>> databases and adding semantics to Relational data. >>>>>> >>>>>> This recommendation is based on the a survey of the State Of the Art >>>>>> conducted by the XG [StateOfArt] <#StateOfArt> as well as the usecases >>>>>> discussed below. >>>>>> >>>>>> The mapping language defined by the WG would facilitate the >>>>>> development >>>>>> of several types of products. It could be used to translate Relational >>>>>> data into RDF which could be stored in a triple store. This is >>>>>> sometimes >>>>>> called Extract-Transform-Load (ETL). Or it could be used to generate a >>>>>> virtual mapping that could be queried using SPARQL and the SPARQL >>>>>> translated to SQL queries on the underlying Relational data. Other >>>>>> products could be layered on top of these capabilities to query and >>>>>> deliver data in different ways as well as to integrate the data with >>>>>> other kinds of information on the Semantic Web. >>>>>> >>>>>> The mapping language should be complete regarding when compared to to >>>>>> the relational algebra. It should have a human-readable syntax as well >>>>>> as XML and RDF representations of the syntax for purposes of discovery >>>>>> and machine generation. >>>>>> >>>>>> There is a strong suggestion that the mapping language be expressed in >>>>>> rules as defined by the W3C [RIF] <#RIF> WG. The syntax does not >>>>>> have to >>>>>> follow the RIF syntax but there should a round-trippable mapping >>>>>> between >>>>>> mapping language and a RIF dialect. The output of the mapping >>>>>> should be >>>>>> defined in terms of an RDFS/OWL schema. >>>>>> >>>>>> It should be possible to subset the language for simple applications >>>>>> such as Web 2.0. This feature of the language will be validated by >>>>>> creating a library of mappings for widely used apps such as Drupal, >>>>>> Wordpress, phpBB. >>>>>> >>>>>> The mapping language will allow customization with regard to names and >>>>>> data transformation. In addition, the language must be able to expose >>>>>> vendor specific SQL features such as full-text and spatial support and >>>>>> vendor-defined datatypes. >>>>>> >>>>>> The final language specification should include guidance with >>>>>> regard to >>>>>> mapping Relational data to a subset of OWL such as OWL/QL or OWL/RL. >>>>>> >>>>>> The language must allow for a mechanism to create identifiers for >>>>>> database entities. The generation of identifiers should be designed to >>>>>> support the implementation of the linked data principles [LinkedData] >>>>>> <#LinkedData>. Where possible, the language will encourage the >>>>>> reuse of >>>>>> public identifiers for long-lived entities such as persons, >>>>>> corporations, geo-locations, etc. See *1.2 Liaisons* <#liaisons>. >>>>>> >>>>>> The proposed Working Group will also create a set of test cases that >>>>>> could be used to verify conformance. >>>>>> >>>>>> >>>>>> 1.1 Usecases >>>>>> >>>>>> To bootstrap exploitation of the Web as a globally accessible linked >>>>>> database, we need a few essentials: >>>>>> >>>>>> * Web accessible data needs to increase in granularity and cross >>>>>> linkage. >>>>>> * Web applications and solutions must produce structured >>>>>> interlinked >>>>>> data as extensions of existing functionality. >>>>>> * Web users must be shielded from the underlying complexity of >>>>>> injecting structured linked data into the Web. >>>>>> >>>>>> >>>>>> 1.1.1 Integrating Databases to Research Nicotine Dependency >>>>>> >>>>>> Complex biological queries generally require the integration of >>>>>> information from several sources. To understand the genetic basis of >>>>>> nicotine dependence, gene and pathway information needed to be >>>>>> integrated and three complex biological queries answered using the >>>>>> integrated knowledge base. The gene information source NCBI Entrez >>>>>> Gene, >>>>>> which has gene-related records of ~2 million genes needed to be >>>>>> integrated with pathway information sources, such as KEGG (Kyoto >>>>>> Encyclopedia for Genes and Genomics). Comparing results across model >>>>>> organisms required homology information provided by the NCBI >>>>>> HomoloGene, >>>>>> containing homology data for several completely sequenced eukaryotic >>>>>> organisms). >>>>>> >>>>>> An ontology-driven approach was used to integrate the two gene >>>>>> resources >>>>>> (Entrez Gene and HomoloGene) and the three pathway resources (KEGG, >>>>>> Reactome and BioCyc). An OWL ontology called the Entrez Knowledge >>>>>> Model >>>>>> (EKoM) was created for the gene resources and integrated with the >>>>>> extant >>>>>> BioPAX ontology designed for pathway resources. The integrated schema >>>>>> was populated with data from the pathway resources, publicly available >>>>>> in BioPAX-compatible format, and gene resources for which a population >>>>>> procedure was created. >>>>>> >>>>>> SPARQL was used to formulate queries to investigate the genetic >>>>>> basis of >>>>>> nicotine dependence over the integrated knowledge base: >>>>>> >>>>>> * Which genes participate in a large number of pathways? >>>>>> * Identify "hub genes" from the perspective of gene interaction? >>>>>> * Which genes are expressed in the brain, in the context of >>>>>> neurobiology of nicotine dependence and various >>>>>> neurotransmitters >>>>>> in the central nervous system? >>>>>> >>>>>> The result was very successful. The queries could easily identify hub >>>>>> genes, i.e., those genes whose gene products participate in many >>>>>> pathways or interact with many other gene products. See >>>>>> [NicotineDependence] <#> for details. >>>>>> >>>>>> >>>>>> 1.1.2 Triplify: Exposing Relational Data on the Web >>>>>> >>>>>> In order to make the Semantic Web useful to ordinary Web users, RDF >>>>>> and >>>>>> OWL have to be deployed on the Web on a much larger scale. Web >>>>>> applications such as Content Management Systems, online shops or >>>>>> community applications (e.g. Wikis, Blogs, Fora) already store their >>>>>> data in relational databases [Triplify] <#TriplifyPaper>. Providing a >>>>>> standardized way to map the relational data structures behind these >>>>>> Web >>>>>> applications into RDF, RDF-Schema and OWL will facilitate broad >>>>>> penetration and enrich the Web with RDF data and ontologies and >>>>>> facilitate novel semantic browsing and search applications. >>>>>> >>>>>> By supporting the long tail of Web applications and thus counteracting >>>>>> the centralization of the Web 2.0 applications the planned RDB2RDF >>>>>> standardization will help to give control over data back to end-users >>>>>> and thus promote a democratization of the Web. >>>>>> >>>>>> To support this usecase scenario, the mapping language should be >>>>>> easily >>>>>> implementable for lightweight Web applications and have a shallow >>>>>> learning curve to foster early adoption by Web developers. >>>>>> >>>>>> >>>>>> 1.1.3 Integration of Enterprise Information Systems >>>>>> >>>>>> Efficient information and data exchange between application systems >>>>>> within and across enterprises is of paramount importance in the >>>>>> increasingly networked and IT-dominated business atmosphere. Existing >>>>>> Enterprise Information Systems such as CRM, CMS and ERP systems use >>>>>> Relational database backends for persistence. RDF and Linked Data can >>>>>> provide data exchange and integration interfaces for such application >>>>>> systems, which are easy to implement and use, especially in settings >>>>>> where a loose and flexible coupling of the systems is required. >>>>>> >>>>>> Insight can often be gained by integrating data from databses built >>>>>> for >>>>>> different purposes in separate corporate silos. For example, >>>>>> integrating >>>>>> data from a bug database with a customer database may help understand >>>>>> ordering behavior as a function of the bugs encountered. >>>>>> >>>>>> In Supply Chain Management (SCM), for example, it is vital to exchange >>>>>> product catalogs and other goods related information within a >>>>>> network of >>>>>> interconnected businesses involved in the ultimate provision of >>>>>> product >>>>>> and service packages. Such information is stored in relational >>>>>> databases >>>>>> and sometimes already exchanged electronically, but a variety of >>>>>> different technologies are used (e.g. proprietary files, XML files, DB >>>>>> dumps, Web Services etc.). Realizing a completely electronic >>>>>> information >>>>>> flow requires significant initial investments and currently limits the >>>>>> flexibility of businesses (e.g. with regard to changes in business >>>>>> partners). The envisioned RDB2RDF mapping language applied in >>>>>> conjunction with existing RDB based SCM systems will support the >>>>>> use of >>>>>> RDF and unique identifiers for realizing flexible information >>>>>> information flows accompanying supply chains. >>>>>> >>>>>> The mapping language to be standardized by the proposed WG will >>>>>> simplify >>>>>> the publishing of enterprise data and information from Relational data >>>>>> backends and, thus, facilitate the interlinking and exchange of >>>>>> information between business information systems. In this scenario >>>>>> on-demand transformation of relational data to RDF, scalability and >>>>>> completeness with regard to the relational algebra are central >>>>>> requirements. >>>>>> >>>>>> >>>>>> 1.1.4 Ordnance Survey Use Case >>>>>> >>>>>> Ordnance Survey, the National mapping agency of the UK, operates a >>>>>> very >>>>>> large geographical information system based on Oracle Spatial. The >>>>>> database contains topographical features, soil type and land use >>>>>> information. All these types of information are independently >>>>>> maintained >>>>>> and use separate terminologies. They describe the same land area >>>>>> but the >>>>>> boundaries of objects utilized for representing land use and soil type >>>>>> and topography do not coincide: For example, a pasture might >>>>>> consist of >>>>>> two distinct types of soil. >>>>>> >>>>>> An example of a need to integrate this information is modeling >>>>>> filtration of pollutants into water bodies from agricultural land. The >>>>>> soil type determines the degree of filtration, the land use determines >>>>>> the type of pollutant. Topography determines whether the field is next >>>>>> to a water body. >>>>>> >>>>>> An ontology exists for describing the types of objects in each >>>>>> database. >>>>>> The benefit from mapping the data to RDF is in simplifying querying >>>>>> and >>>>>> integration of the data. The very high volume of data makes an ETL >>>>>> approach impracticable, besides, the Oracle Spatial database offers >>>>>> spatial joining which is generally not available on RDF stores. >>>>>> >>>>>> Thus, it is necessary to take SPARQL queries expressed in terms of the >>>>>> land use, soil type and topography ontologies and convert them into >>>>>> single SQL statements, with all joining and filtering to take place at >>>>>> the relational database. In the process, high level concepts need >>>>>> to be >>>>>> translated into SQL conditions on data that is not readily human >>>>>> readable. >>>>>> >>>>>> Business questions to be answered by the use case are for example: >>>>>> >>>>>> * What is the total length of river bank bordered by permeable >>>>>> soil >>>>>> used for grazing along a certain river? >>>>>> * What types of crops are being cultivated within 100m of water, >>>>>> with total land use grouped by crop. >>>>>> * What watter bodies are subject to high environmental load from >>>>>> agriculture, as defined by little current and extensive use of >>>>>> adjacent land. >>>>>> >>>>>> From the viewpoint of RDB to RDF mapping, this usecase highlights the >>>>>> need to integrate data from different databases, built for different >>>>>> purposes. It also emphasizes need for extensibility in the mapping >>>>>> language for supporting RDBMS vendor specific features. In the present >>>>>> case, Oracle expresses a spatial join using a special type of derived >>>>>> table not found in standard SQL, thus the customization need is deeper >>>>>> than just supporting calls to native SQL functions. >>>>>> >>>>>> The inference requirement consists primarily of expanding class >>>>>> membership into and's and or's of conditions on the relational >>>>>> data. In >>>>>> some cases, these conditions are spatial, such as bordering on or >>>>>> contained in. The user should be familiar with the ontologies but >>>>>> should >>>>>> not have to know about the classification codes used in the databases. >>>>>> >>>>>> >>>>>> 1.2 Liaisons >>>>>> >>>>>> The WG must track the evolution of SPARQL and liaise with the DAWG >>>>>> WG as >>>>>> well as the OWL WG. The proposed WG will also keep track of work on >>>>>> assigning unique identifiers to well-known entities such as the ENS >>>>>> system associated with the OKKAM project [OKKAM] <#okkam> and the >>>>>> Common >>>>>> Naming Project started by Neuro Commons [Common Naming Project] >>>>>> <#CommonNaming> >>>>>> >>>>>> >>>>>> 1.3 Starting Points >>>>>> >>>>>> The WG will take as its starting point the mapping languages developed >>>>>> by the [D2RQ] <#D2RQ> and [Virtuoso] <#Virtuoso> efforts. >>>>>> >>>>>> >>>>>> 2 References >>>>>> >>>>>> Common Naming Project >>>>>> Neuro Commons Common Naming Project >>>>>> <http://neurocommons.org/page/Common_Naming_Project>, Science >>>>>> Commons, Sept 17, 2008. (See >>>>>> http://neurocommons.org/page/Common_Naming_Project.) >>>>>> D2RQ >>>>>> The D2RQ Platform v0.5.1, User Manual and Language Specification >>>>>> <http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/>, Chris Bizer, >>>>>> Richard Cyganiak, Jorg Garbers, Oliver Maresch (See >>>>>> http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/.) >>>>>> RIF >>>>>> W3C Rule Interchange Format Working Group >>>>>> <http://www.w3.org/2005/rules/wiki/RIF_Working_Group> (See >>>>>> http://www.w3.org/2005/rules/wiki/RIF_Working_Group.) >>>>>> LinkedData >>>>>> Design Issues for Linked Data >>>>>> <http://www.w3.org/DesignIssues/LinkedData.html>, Tim Berners-Lee >>>>>> (See http://www.w3.org/DesignIssues/LinkedData.html.) >>>>>> StateOfArt >>>>>> Mapping Relational Data to RDF and OWL: A Literature Survey >>>>>> <http://esw.w3.org/topic/Rdb2RdfXG/>, Satya Sahoo, Wolfgang Halb >>>>>> (See http://esw.w3.org/topic/Rdb2RdfXG/.) >>>>>> OKKAM >>>>>> An Entity Name System (ENS) for the Semantic Web >>>>>> <http://www.okkam.org/>, Paolo Bouquet, Heiko Stoermer, Barbara >>>>>> Bazzanella, January 2008. (See http://www.okkam.org/.) >>>>>> Virtuoso >>>>>> Virtuoso Open-Source Edition >>>>>> <http://virtuoso.openlinksw.com/wiki/main/Main/> (See >>>>>> http://virtuoso.openlinksw.com/wiki/main/Main/.) >>>>>> Triplify >>>>>> Triplify - Lightweight Linked Data Publication from Relational >>>>>> Databases, submitted to WWW 2009 >>>>>> >>>>>> <http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf>Auer, >>>>>> >>>>>> >>>>>> Dietzold, Lehmann, Hellmann, Aumueller (See >>>>>> >>>>>> http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf.) >>>>>> NicoteneDependence >>>>>> An ontology-driven semantic mashup of gene and biological pathway >>>>>> information: Application to the domain of nicotine dependence >>>>>> <http://dx.doi.org/10.1016/j.jbi.2008.02.006 >Satya S. Sahoo, >>>>>> Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner and Amit P. >>>>>> Shetha (See http://dx.doi.org/10.1016/j.jbi.2008.02.006 .) >>>>>> >>>>>> >>>>> >>>>> >>> >>> > >
Received on Monday, 26 January 2009 16:28:18 UTC