Re: Deliverables from the RDB2RDF XG from ashok malhotra on 2009-01-26 (public-xg-rdb2rdf@w3.org from January 2009)

From: ashok malhotra <ashok.malhotra@oracle.com>
Date: Mon, 26 Jan 2009 08:26:36 -0800
To: Ivan Herman <ivan@w3.org>
CC: Mauro Nunez <mauro@w3.org>, public-xg-rdb2rdf <public-xg-rdb2rdf@w3.org>
Message-ID: <497DE43C.3050303@oracle.com>
OK.  I will ask who wants to dial in.
We can use the member-xg-rdb2rdf list for the details.
All the best, Ashok


Ivan Herman wrote:
> Ashok,
>
> I would prefer not to copy the final setup and possible followup
> discussions to this list simply because the other (coordination group)
> mailing list is member confidential. Mixing two lists with different
> confidentiality level is a recipe for something going wrong:-) I hope
> that is all right.
>
> The meeting is still fairly far away, ie, we have time; would it be
> possible to tell me who would dial in besides you in a few weeks? I
> would then contact them personally.
>
> Thanks a lot again!
>
> Ivan
>
> ashok malhotra wrote:
>   
>> I'll be happy to dial in.  Others from the XG may want to dial in as well.
>> I will put the final report on the W3C site and send out the pointer
>> when it is done.
>>
>> Please send details of the telcon to this list.
>>
>> All the best, Ashok
>>
>>
>> Ivan Herman wrote:
>>     
>>> My apologies, I should have been more precise: the meeting is a telco!
>>> Ie, the precise answer to your question is: on zakim:-)
>>>
>>> Ivan
>>>
>>> ashok malhotra wrote:
>>>  
>>>       
>>>> Hi Ivan:
>>>> You said ...
>>>>
>>>> - Would you or one of your colleagues be ready to come (again:-) to one
>>>> of our next SW Coordination Group meeting to give a bit of a report and
>>>> discuss the possible followups?
>>>> Where is the meeting?
>>>>
>>>> All the best, Ashok
>>>>
>>>> Ivan Herman wrote:
>>>>    
>>>>         
>>>>> Hi Ashok,
>>>>>
>>>>> First of all, thanks! I have actually two questions, none of those are
>>>>> closely related to your original question (that Mauro already answered,
>>>>> I believe:-)
>>>>>
>>>>> - Would you or one of your colleagues be ready to come (again:-) to one
>>>>> of our next SW Coordination Group meeting to give a bit of a report and
>>>>> discuss the possible followups? The best date would be the 20th of
>>>>> February, Friday, at 16:00 Amsterdam time (I guess 10:00 Boston time)?
>>>>>
>>>>> - Did the group thought of also preparing a rough draft charter for the
>>>>> group you propose? It would make things easier to discuss both
>>>>> internally and externally. There can be many empty slots in the charter
>>>>> but it would give an idea to move forward. It would also give a feeling
>>>>> on who would/could staff such a group.
>>>>>
>>>>> Thanks again!
>>>>>
>>>>> Cheers
>>>>>
>>>>> Ivan
>>>>>
>>>>> ashok malhotra wrote:
>>>>>  
>>>>>      
>>>>>           
>>>>>> Ivan, Mauro:
>>>>>> As you know, the RDB2RDF XG is coming to a close.  We are planning two
>>>>>> deliverables and I thought I would run them by you for early comments.
>>>>>>
>>>>>> 1. We have prepared a final report.  This is attached.  I am trying to
>>>>>> get permission to put it on the W3C site.
>>>>>> 2. We have prepared a State Of the Art Survey.  This is in the form of
>>>>>> extensions to the ESW  Wiki
>>>>>> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt or as a PDF file
>>>>>> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf
>>>>>>
>>>>>>
>>>>>> <http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf>.
>>>>>>
>>>>>>
>>>>>> both have the same content.  Is this format acceptable for an XG
>>>>>> deliverable?
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> W3C <http://www.w3.org/>W3C Incubator Report
>>>>>> <http://www.w3.org/2005/Incubator/XGR/>
>>>>>>
>>>>>>
>>>>>>   W3C RDB2RDF Incubator Group Report
>>>>>>
>>>>>>
>>>>>>     16 January 2009
>>>>>>
>>>>>> This version:
>>>>>>     http://www.w3.org/XG_Report/2009/RDB2RDF_XG-20090116 Latest
>>>>>> version:
>>>>>>     http://www.w3.org/ XG_Report/RDB2RDF_XG
>>>>>>     <http://www.w3.org/XG_Report/RDB2RDF_XG> Previous version:
>>>>>>     This is the first public version. Author:
>>>>>>     Ashok Malhotra (editor), Oracle
>>>>>>
>>>>>> Copyright © 2008 W3C <http://www.w3c.org>. All rights reserved. This
>>>>>> document is available under the W3 C Document License
>>>>>> <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231>.
>>>>>>
>>>>>> See the W 3C Intellectual Rights Notice and Legal Disclaimers
>>>>>> <http://www.w3.org/Consortium/Legal/2002/ipr-notice-20021231#Copyright>
>>>>>>
>>>>>> for additional information.
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>     Abstract
>>>>>>
>>>>>> This is the final report from the RDB2RDF XG. The XG recommends
>>>>>> that the
>>>>>> W3C initiate a WG to standardize a language for mapping Relational
>>>>>> Database schemas into RDF and OWL.
>>>>>>
>>>>>>
>>>>>>     Status of this Document
>>>>>>
>>>>>> /This section describes the status of this document at the time of its
>>>>>> publication. Other documents may supersede this document. A list of
>>>>>> current W3C publications can be found in the W3C technical reports
>>>>>> index
>>>>>> <http://www.w3.org/TR/> at http://www.w3.org/TR/./
>>>>>>
>>>>>> This is the final recommendation from the RDB2RDF XG.
>>>>>>
>>>>>>
>>>>>>     Table of Contents
>>>>>>
>>>>>> 1 Recommendation <#recommendation>
>>>>>>     1.1 Usecases <#usecases>
>>>>>>         1.1.1 Integrating Databases to Research Nicotine Dependency
>>>>>> <#biomedical>
>>>>>>         1.1.2 Triplify: Exposing Relational Data on the Web
>>>>>> <#Triplify>
>>>>>>         1.1.3 Integration of Enterprise Information Systems
>>>>>> <#enterprise>
>>>>>>         1.1.4 Ordnance Survey Use Case <#ordnance>
>>>>>>     1.2 Liaisons <#liaisons>
>>>>>>     1.3 Starting Points <#IDA2UIP>
>>>>>> 2 References <#References>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>     1 Recommendation
>>>>>>
>>>>>> The RDB2RDF XG recommends that the W3C initiate a Working Group
>>>>>> (WG) to
>>>>>> standardize a language for mapping Relational Database schemas into
>>>>>> RDF
>>>>>> and OWL. Such a standard will enable the vast amounts of data
>>>>>> stored in
>>>>>> Relational databases to be published easily and conveniently on the
>>>>>> Web.
>>>>>> It will also facilitate integrating data from separate Relational
>>>>>> databases and adding semantics to Relational data.
>>>>>>
>>>>>> This recommendation is based on the a survey of the State Of the Art
>>>>>> conducted by the XG [StateOfArt] <#StateOfArt> as well as the usecases
>>>>>> discussed below.
>>>>>>
>>>>>> The mapping language defined by the WG would facilitate the
>>>>>> development
>>>>>> of several types of products. It could be used to translate Relational
>>>>>> data into RDF which could be stored in a triple store. This is
>>>>>> sometimes
>>>>>> called Extract-Transform-Load (ETL). Or it could be used to generate a
>>>>>> virtual mapping that could be queried using SPARQL and the SPARQL
>>>>>> translated to SQL queries on the underlying Relational data. Other
>>>>>> products could be layered on top of these capabilities to query and
>>>>>> deliver data in different ways as well as to integrate the data with
>>>>>> other kinds of information on the Semantic Web.
>>>>>>
>>>>>> The mapping language should be complete regarding when compared to to
>>>>>> the relational algebra. It should have a human-readable syntax as well
>>>>>> as XML and RDF representations of the syntax for purposes of discovery
>>>>>> and machine generation.
>>>>>>
>>>>>> There is a strong suggestion that the mapping language be expressed in
>>>>>> rules as defined by the W3C [RIF] <#RIF> WG. The syntax does not
>>>>>> have to
>>>>>> follow the RIF syntax but there should a round-trippable mapping
>>>>>> between
>>>>>> mapping language and a RIF dialect. The output of the mapping
>>>>>> should be
>>>>>> defined in terms of an RDFS/OWL schema.
>>>>>>
>>>>>> It should be possible to subset the language for simple applications
>>>>>> such as Web 2.0. This feature of the language will be validated by
>>>>>> creating a library of mappings for widely used apps such as Drupal,
>>>>>> Wordpress, phpBB.
>>>>>>
>>>>>> The mapping language will allow customization with regard to names and
>>>>>> data transformation. In addition, the language must be able to expose
>>>>>> vendor specific SQL features such as full-text and spatial support and
>>>>>> vendor-defined datatypes.
>>>>>>
>>>>>> The final language specification should include guidance with
>>>>>> regard to
>>>>>> mapping Relational data to a subset of OWL such as OWL/QL or OWL/RL.
>>>>>>
>>>>>> The language must allow for a mechanism to create identifiers for
>>>>>> database entities. The generation of identifiers should be designed to
>>>>>> support the implementation of the linked data principles [LinkedData]
>>>>>> <#LinkedData>. Where possible, the language will encourage the
>>>>>> reuse of
>>>>>> public identifiers for long-lived entities such as persons,
>>>>>> corporations, geo-locations, etc. See *1.2 Liaisons* <#liaisons>.
>>>>>>
>>>>>> The proposed Working Group will also create a set of test cases that
>>>>>> could be used to verify conformance.
>>>>>>
>>>>>>
>>>>>>       1.1 Usecases
>>>>>>
>>>>>> To bootstrap exploitation of the Web as a globally accessible linked
>>>>>> database, we need a few essentials:
>>>>>>
>>>>>>     * Web accessible data needs to increase in granularity and cross
>>>>>>       linkage.
>>>>>>     * Web applications and solutions must produce structured
>>>>>> interlinked
>>>>>>       data as extensions of existing functionality.
>>>>>>     * Web users must be shielded from the underlying complexity of
>>>>>>       injecting structured linked data into the Web.
>>>>>>
>>>>>>
>>>>>>         1.1.1 Integrating Databases to Research Nicotine Dependency
>>>>>>
>>>>>> Complex biological queries generally require the integration of
>>>>>> information from several sources. To understand the genetic basis of
>>>>>> nicotine dependence, gene and pathway information needed to be
>>>>>> integrated and three complex biological queries answered using the
>>>>>> integrated knowledge base. The gene information source NCBI Entrez
>>>>>> Gene,
>>>>>> which has gene-related records of ~2 million genes needed to be
>>>>>> integrated with pathway information sources, such as KEGG (Kyoto
>>>>>> Encyclopedia for Genes and Genomics). Comparing results across model
>>>>>> organisms required homology information provided by the NCBI
>>>>>> HomoloGene,
>>>>>> containing homology data for several completely sequenced eukaryotic
>>>>>> organisms).
>>>>>>
>>>>>> An ontology-driven approach was used to integrate the two gene
>>>>>> resources
>>>>>> (Entrez Gene and HomoloGene) and the three pathway resources (KEGG,
>>>>>> Reactome and BioCyc). An OWL ontology called the Entrez Knowledge
>>>>>> Model
>>>>>> (EKoM) was created for the gene resources and integrated with the
>>>>>> extant
>>>>>> BioPAX ontology designed for pathway resources. The integrated schema
>>>>>> was populated with data from the pathway resources, publicly available
>>>>>> in BioPAX-compatible format, and gene resources for which a population
>>>>>> procedure was created.
>>>>>>
>>>>>> SPARQL was used to formulate queries to investigate the genetic
>>>>>> basis of
>>>>>> nicotine dependence over the integrated knowledge base:
>>>>>>
>>>>>>     * Which genes participate in a large number of pathways?
>>>>>>     * Identify "hub genes" from the perspective of gene interaction?
>>>>>>     * Which genes are expressed in the brain, in the context of
>>>>>>       neurobiology of nicotine dependence and various
>>>>>> neurotransmitters
>>>>>>       in the central nervous system?
>>>>>>
>>>>>> The result was very successful. The queries could easily identify hub
>>>>>> genes, i.e., those genes whose gene products participate in many
>>>>>> pathways or interact with many other gene products. See
>>>>>> [NicotineDependence] <#> for details.
>>>>>>
>>>>>>
>>>>>>         1.1.2 Triplify: Exposing Relational Data on the Web
>>>>>>
>>>>>> In order to make the Semantic Web useful to ordinary Web users, RDF
>>>>>> and
>>>>>> OWL have to be deployed on the Web on a much larger scale. Web
>>>>>> applications such as Content Management Systems, online shops or
>>>>>> community applications (e.g. Wikis, Blogs, Fora) already store their
>>>>>> data in relational databases [Triplify] <#TriplifyPaper>. Providing a
>>>>>> standardized way to map the relational data structures behind these
>>>>>> Web
>>>>>> applications into RDF, RDF-Schema and OWL will facilitate broad
>>>>>> penetration and enrich the Web with RDF data and ontologies and
>>>>>> facilitate novel semantic browsing and search applications.
>>>>>>
>>>>>> By supporting the long tail of Web applications and thus counteracting
>>>>>> the centralization of the Web 2.0 applications the planned RDB2RDF
>>>>>> standardization will help to give control over data back to end-users
>>>>>> and thus promote a democratization of the Web.
>>>>>>
>>>>>> To support this usecase scenario, the mapping language should be
>>>>>> easily
>>>>>> implementable for lightweight Web applications and have a shallow
>>>>>> learning curve to foster early adoption by Web developers.
>>>>>>
>>>>>>
>>>>>>         1.1.3 Integration of Enterprise Information Systems
>>>>>>
>>>>>> Efficient information and data exchange between application systems
>>>>>> within and across enterprises is of paramount importance in the
>>>>>> increasingly networked and IT-dominated business atmosphere. Existing
>>>>>> Enterprise Information Systems such as CRM, CMS and ERP systems use
>>>>>> Relational database backends for persistence. RDF and Linked Data can
>>>>>> provide data exchange and integration interfaces for such application
>>>>>> systems, which are easy to implement and use, especially in settings
>>>>>> where a loose and flexible coupling of the systems is required.
>>>>>>
>>>>>> Insight can often be gained by integrating data from databses built
>>>>>> for
>>>>>> different purposes in separate corporate silos. For example,
>>>>>> integrating
>>>>>> data from a bug database with a customer database may help understand
>>>>>> ordering behavior as a function of the bugs encountered.
>>>>>>
>>>>>> In Supply Chain Management (SCM), for example, it is vital to exchange
>>>>>> product catalogs and other goods related information within a
>>>>>> network of
>>>>>> interconnected businesses involved in the ultimate provision of
>>>>>> product
>>>>>> and service packages. Such information is stored in relational
>>>>>> databases
>>>>>> and sometimes already exchanged electronically, but a variety of
>>>>>> different technologies are used (e.g. proprietary files, XML files, DB
>>>>>> dumps, Web Services etc.). Realizing a completely electronic
>>>>>> information
>>>>>> flow requires significant initial investments and currently limits the
>>>>>> flexibility of businesses (e.g. with regard to changes in business
>>>>>> partners). The envisioned RDB2RDF mapping language applied in
>>>>>> conjunction with existing RDB based SCM systems will support the
>>>>>> use of
>>>>>> RDF and unique identifiers for realizing flexible information
>>>>>> information flows accompanying supply chains.
>>>>>>
>>>>>> The mapping language to be standardized by the proposed WG will
>>>>>> simplify
>>>>>> the publishing of enterprise data and information from Relational data
>>>>>> backends and, thus, facilitate the interlinking and exchange of
>>>>>> information between business information systems. In this scenario
>>>>>> on-demand transformation of relational data to RDF, scalability and
>>>>>> completeness with regard to the relational algebra are central
>>>>>> requirements.
>>>>>>
>>>>>>
>>>>>>         1.1.4 Ordnance Survey Use Case
>>>>>>
>>>>>> Ordnance Survey, the National mapping agency of the UK, operates a
>>>>>> very
>>>>>> large geographical information system based on Oracle Spatial. The
>>>>>> database contains topographical features, soil type and land use
>>>>>> information. All these types of information are independently
>>>>>> maintained
>>>>>> and use separate terminologies. They describe the same land area
>>>>>> but the
>>>>>> boundaries of objects utilized for representing land use and soil type
>>>>>> and topography do not coincide: For example, a pasture might
>>>>>> consist of
>>>>>> two distinct types of soil.
>>>>>>
>>>>>> An example of a need to integrate this information is modeling
>>>>>> filtration of pollutants into water bodies from agricultural land. The
>>>>>> soil type determines the degree of filtration, the land use determines
>>>>>> the type of pollutant. Topography determines whether the field is next
>>>>>> to a water body.
>>>>>>
>>>>>> An ontology exists for describing the types of objects in each
>>>>>> database.
>>>>>> The benefit from mapping the data to RDF is in simplifying querying
>>>>>> and
>>>>>> integration of the data. The very high volume of data makes an ETL
>>>>>> approach impracticable, besides, the Oracle Spatial database offers
>>>>>> spatial joining which is generally not available on RDF stores.
>>>>>>
>>>>>> Thus, it is necessary to take SPARQL queries expressed in terms of the
>>>>>> land use, soil type and topography ontologies and convert them into
>>>>>> single SQL statements, with all joining and filtering to take place at
>>>>>> the relational database. In the process, high level concepts need
>>>>>> to be
>>>>>> translated into SQL conditions on data that is not readily human
>>>>>> readable.
>>>>>>
>>>>>> Business questions to be answered by the use case are for example:
>>>>>>
>>>>>>     * What is the total length of river bank bordered by permeable
>>>>>> soil
>>>>>>       used for grazing along a certain river?
>>>>>>     * What types of crops are being cultivated within 100m of water,
>>>>>>       with total land use grouped by crop.
>>>>>>     * What watter bodies are subject to high environmental load from
>>>>>>       agriculture, as defined by little current and extensive use of
>>>>>>       adjacent land.
>>>>>>
>>>>>> From the viewpoint of RDB to RDF mapping, this usecase highlights the
>>>>>> need to integrate data from different databases, built for different
>>>>>> purposes. It also emphasizes need for extensibility in the mapping
>>>>>> language for supporting RDBMS vendor specific features. In the present
>>>>>> case, Oracle expresses a spatial join using a special type of derived
>>>>>> table not found in standard SQL, thus the customization need is deeper
>>>>>> than just supporting calls to native SQL functions.
>>>>>>
>>>>>> The inference requirement consists primarily of expanding class
>>>>>> membership into and's and or's of conditions on the relational
>>>>>> data. In
>>>>>> some cases, these conditions are spatial, such as bordering on or
>>>>>> contained in. The user should be familiar with the ontologies but
>>>>>> should
>>>>>> not have to know about the classification codes used in the databases.
>>>>>>
>>>>>>
>>>>>>       1.2 Liaisons
>>>>>>
>>>>>> The WG must track the evolution of SPARQL and liaise with the DAWG
>>>>>> WG as
>>>>>> well as the OWL WG. The proposed WG will also keep track of work on
>>>>>> assigning unique identifiers to well-known entities such as the ENS
>>>>>> system associated with the OKKAM project [OKKAM] <#okkam> and the
>>>>>> Common
>>>>>> Naming Project started by Neuro Commons [Common Naming Project]
>>>>>> <#CommonNaming>
>>>>>>
>>>>>>
>>>>>>       1.3 Starting Points
>>>>>>
>>>>>> The WG will take as its starting point the mapping languages developed
>>>>>> by the [D2RQ] <#D2RQ> and [Virtuoso] <#Virtuoso> efforts.
>>>>>>
>>>>>>
>>>>>>     2 References
>>>>>>
>>>>>> Common Naming Project
>>>>>>     Neuro Commons Common Naming Project
>>>>>>     <http://neurocommons.org/page/Common_Naming_Project>, Science
>>>>>>     Commons, Sept 17, 2008. (See
>>>>>>     http://neurocommons.org/page/Common_Naming_Project.)
>>>>>> D2RQ
>>>>>>     The D2RQ Platform v0.5.1, User Manual and Language Specification
>>>>>>     <http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/>, Chris Bizer,
>>>>>>     Richard Cyganiak, Jorg Garbers, Oliver Maresch (See
>>>>>>     http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/.)
>>>>>> RIF
>>>>>>     W3C Rule Interchange Format Working Group
>>>>>>     <http://www.w3.org/2005/rules/wiki/RIF_Working_Group> (See
>>>>>>     http://www.w3.org/2005/rules/wiki/RIF_Working_Group.)
>>>>>> LinkedData
>>>>>>     Design Issues for Linked Data
>>>>>>     <http://www.w3.org/DesignIssues/LinkedData.html>, Tim Berners-Lee
>>>>>>     (See http://www.w3.org/DesignIssues/LinkedData.html.)
>>>>>> StateOfArt
>>>>>>     Mapping Relational Data to RDF and OWL: A Literature Survey
>>>>>>     <http://esw.w3.org/topic/Rdb2RdfXG/>, Satya Sahoo, Wolfgang Halb
>>>>>>     (See http://esw.w3.org/topic/Rdb2RdfXG/.)
>>>>>> OKKAM
>>>>>>     An Entity Name System (ENS) for the Semantic Web
>>>>>>     <http://www.okkam.org/>, Paolo Bouquet, Heiko Stoermer, Barbara
>>>>>>     Bazzanella, January 2008. (See http://www.okkam.org/.)
>>>>>> Virtuoso
>>>>>>     Virtuoso Open-Source Edition
>>>>>>     <http://virtuoso.openlinksw.com/wiki/main/Main/> (See
>>>>>>     http://virtuoso.openlinksw.com/wiki/main/Main/.)
>>>>>> Triplify
>>>>>>     Triplify - Lightweight Linked Data Publication from Relational
>>>>>>     Databases, submitted to WWW 2009
>>>>>>   
>>>>>> <http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf>Auer,
>>>>>>
>>>>>>
>>>>>>     Dietzold, Lehmann, Hellmann, Aumueller (See
>>>>>>   
>>>>>> http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf.)
>>>>>> NicoteneDependence
>>>>>>     An ontology-driven semantic mashup of gene and biological pathway
>>>>>>     information: Application to the domain of nicotine dependence
>>>>>>     <http://dx.doi.org/10.1016/j.jbi.2008.02.006 >Satya S. Sahoo,
>>>>>>     Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner and Amit P.
>>>>>>     Shetha (See http://dx.doi.org/10.1016/j.jbi.2008.02.006 .)
>>>>>>             
>>>>>>             
>>>>>         
>>>>>           
>>>   
>>>       
>
>
Received on Monday, 26 January 2009 16:28:18 UTC