Re: Deliverables from the RDB2RDF XG from Ivan Herman on 2009-01-26 (public-xg-rdb2rdf@w3.org from January 2009)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 26 Jan 2009 17:24:14 +0100
To: ashok.malhotra@oracle.com
CC: Mauro Nunez <mauro@w3.org>, public-xg-rdb2rdf <public-xg-rdb2rdf@w3.org>
Message-ID: <497DE3AE.3040101@w3.org>
Ashok,

I would prefer not to copy the final setup and possible followup
discussions to this list simply because the other (coordination group)
mailing list is member confidential. Mixing two lists with different
confidentiality level is a recipe for something going wrong:-) I hope
that is all right.

The meeting is still fairly far away, ie, we have time; would it be
possible to tell me who would dial in besides you in a few weeks? I
would then contact them personally.

Thanks a lot again!

Ivan

ashok malhotra wrote:
> I'll be happy to dial in.  Others from the XG may want to dial in as well.
> I will put the final report on the W3C site and send out the pointer
> when it is done.
> 
> Please send details of the telcon to this list.
> 
> All the best, Ashok
> 
> 
> Ivan Herman wrote:
>> My apologies, I should have been more precise: the meeting is a telco!
>> Ie, the precise answer to your question is: on zakim:-)
>>
>> Ivan
>>
>> ashok malhotra wrote:
>>  
>>> Hi Ivan:
>>> You said ...
>>>
>>> - Would you or one of your colleagues be ready to come (again:-) to one
>>> of our next SW Coordination Group meeting to give a bit of a report and
>>> discuss the possible followups?
>>> Where is the meeting?
>>>
>>> All the best, Ashok
>>>
>>> Ivan Herman wrote:
>>>    
>>>> Hi Ashok,
>>>>
>>>> First of all, thanks! I have actually two questions, none of those are
>>>> closely related to your original question (that Mauro already answered,
>>>> I believe:-)
>>>>
>>>> - Would you or one of your colleagues be ready to come (again:-) to one
>>>> of our next SW Coordination Group meeting to give a bit of a report and
>>>> discuss the possible followups? The best date would be the 20th of
>>>> February, Friday, at 16:00 Amsterdam time (I guess 10:00 Boston time)?
>>>>
>>>> - Did the group thought of also preparing a rough draft charter for the
>>>> group you propose? It would make things easier to discuss both
>>>> internally and externally. There can be many empty slots in the charter
>>>> but it would give an idea to move forward. It would also give a feeling
>>>> on who would/could staff such a group.
>>>>
>>>> Thanks again!
>>>>
>>>> Cheers
>>>>
>>>> Ivan
>>>>
>>>> ashok malhotra wrote:
>>>>  
>>>>      
>>>>> Ivan, Mauro:
>>>>> As you know, the RDB2RDF XG is coming to a close.  We are planning two
>>>>> deliverables and I thought I would run them by you for early comments.
>>>>>
>>>>> 1. We have prepared a final report.  This is attached.  I am trying to
>>>>> get permission to put it on the W3C site.
>>>>> 2. We have prepared a State Of the Art Survey.  This is in the form of
>>>>> extensions to the ESW  Wiki
>>>>> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt or as a PDF file
>>>>> http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf
>>>>>
>>>>>
>>>>> <http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt?action=AttachFile&do=get&target=RDB2RDF_SurveyReport.pdf>.
>>>>>
>>>>>
>>>>> both have the same content.  Is this format acceptable for an XG
>>>>> deliverable?
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> W3C <http://www.w3.org/>W3C Incubator Report
>>>>> <http://www.w3.org/2005/Incubator/XGR/>
>>>>>
>>>>>
>>>>>   W3C RDB2RDF Incubator Group Report
>>>>>
>>>>>
>>>>>     16 January 2009
>>>>>
>>>>> This version:
>>>>>     http://www.w3.org/XG_Report/2009/RDB2RDF_XG-20090116 Latest
>>>>> version:
>>>>>     http://www.w3.org/ XG_Report/RDB2RDF_XG
>>>>>     <http://www.w3.org/XG_Report/RDB2RDF_XG> Previous version:
>>>>>     This is the first public version. Author:
>>>>>     Ashok Malhotra (editor), Oracle
>>>>>
>>>>> Copyright © 2008 W3C <http://www.w3c.org>. All rights reserved. This
>>>>> document is available under the W3 C Document License
>>>>> <http://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231>.
>>>>>
>>>>> See the W 3C Intellectual Rights Notice and Legal Disclaimers
>>>>> <http://www.w3.org/Consortium/Legal/2002/ipr-notice-20021231#Copyright>
>>>>>
>>>>> for additional information.
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>     Abstract
>>>>>
>>>>> This is the final report from the RDB2RDF XG. The XG recommends
>>>>> that the
>>>>> W3C initiate a WG to standardize a language for mapping Relational
>>>>> Database schemas into RDF and OWL.
>>>>>
>>>>>
>>>>>     Status of this Document
>>>>>
>>>>> /This section describes the status of this document at the time of its
>>>>> publication. Other documents may supersede this document. A list of
>>>>> current W3C publications can be found in the W3C technical reports
>>>>> index
>>>>> <http://www.w3.org/TR/> at http://www.w3.org/TR/./
>>>>>
>>>>> This is the final recommendation from the RDB2RDF XG.
>>>>>
>>>>>
>>>>>     Table of Contents
>>>>>
>>>>> 1 Recommendation <#recommendation>
>>>>>     1.1 Usecases <#usecases>
>>>>>         1.1.1 Integrating Databases to Research Nicotine Dependency
>>>>> <#biomedical>
>>>>>         1.1.2 Triplify: Exposing Relational Data on the Web
>>>>> <#Triplify>
>>>>>         1.1.3 Integration of Enterprise Information Systems
>>>>> <#enterprise>
>>>>>         1.1.4 Ordnance Survey Use Case <#ordnance>
>>>>>     1.2 Liaisons <#liaisons>
>>>>>     1.3 Starting Points <#IDA2UIP>
>>>>> 2 References <#References>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>     1 Recommendation
>>>>>
>>>>> The RDB2RDF XG recommends that the W3C initiate a Working Group
>>>>> (WG) to
>>>>> standardize a language for mapping Relational Database schemas into
>>>>> RDF
>>>>> and OWL. Such a standard will enable the vast amounts of data
>>>>> stored in
>>>>> Relational databases to be published easily and conveniently on the
>>>>> Web.
>>>>> It will also facilitate integrating data from separate Relational
>>>>> databases and adding semantics to Relational data.
>>>>>
>>>>> This recommendation is based on the a survey of the State Of the Art
>>>>> conducted by the XG [StateOfArt] <#StateOfArt> as well as the usecases
>>>>> discussed below.
>>>>>
>>>>> The mapping language defined by the WG would facilitate the
>>>>> development
>>>>> of several types of products. It could be used to translate Relational
>>>>> data into RDF which could be stored in a triple store. This is
>>>>> sometimes
>>>>> called Extract-Transform-Load (ETL). Or it could be used to generate a
>>>>> virtual mapping that could be queried using SPARQL and the SPARQL
>>>>> translated to SQL queries on the underlying Relational data. Other
>>>>> products could be layered on top of these capabilities to query and
>>>>> deliver data in different ways as well as to integrate the data with
>>>>> other kinds of information on the Semantic Web.
>>>>>
>>>>> The mapping language should be complete regarding when compared to to
>>>>> the relational algebra. It should have a human-readable syntax as well
>>>>> as XML and RDF representations of the syntax for purposes of discovery
>>>>> and machine generation.
>>>>>
>>>>> There is a strong suggestion that the mapping language be expressed in
>>>>> rules as defined by the W3C [RIF] <#RIF> WG. The syntax does not
>>>>> have to
>>>>> follow the RIF syntax but there should a round-trippable mapping
>>>>> between
>>>>> mapping language and a RIF dialect. The output of the mapping
>>>>> should be
>>>>> defined in terms of an RDFS/OWL schema.
>>>>>
>>>>> It should be possible to subset the language for simple applications
>>>>> such as Web 2.0. This feature of the language will be validated by
>>>>> creating a library of mappings for widely used apps such as Drupal,
>>>>> Wordpress, phpBB.
>>>>>
>>>>> The mapping language will allow customization with regard to names and
>>>>> data transformation. In addition, the language must be able to expose
>>>>> vendor specific SQL features such as full-text and spatial support and
>>>>> vendor-defined datatypes.
>>>>>
>>>>> The final language specification should include guidance with
>>>>> regard to
>>>>> mapping Relational data to a subset of OWL such as OWL/QL or OWL/RL.
>>>>>
>>>>> The language must allow for a mechanism to create identifiers for
>>>>> database entities. The generation of identifiers should be designed to
>>>>> support the implementation of the linked data principles [LinkedData]
>>>>> <#LinkedData>. Where possible, the language will encourage the
>>>>> reuse of
>>>>> public identifiers for long-lived entities such as persons,
>>>>> corporations, geo-locations, etc. See *1.2 Liaisons* <#liaisons>.
>>>>>
>>>>> The proposed Working Group will also create a set of test cases that
>>>>> could be used to verify conformance.
>>>>>
>>>>>
>>>>>       1.1 Usecases
>>>>>
>>>>> To bootstrap exploitation of the Web as a globally accessible linked
>>>>> database, we need a few essentials:
>>>>>
>>>>>     * Web accessible data needs to increase in granularity and cross
>>>>>       linkage.
>>>>>     * Web applications and solutions must produce structured
>>>>> interlinked
>>>>>       data as extensions of existing functionality.
>>>>>     * Web users must be shielded from the underlying complexity of
>>>>>       injecting structured linked data into the Web.
>>>>>
>>>>>
>>>>>         1.1.1 Integrating Databases to Research Nicotine Dependency
>>>>>
>>>>> Complex biological queries generally require the integration of
>>>>> information from several sources. To understand the genetic basis of
>>>>> nicotine dependence, gene and pathway information needed to be
>>>>> integrated and three complex biological queries answered using the
>>>>> integrated knowledge base. The gene information source NCBI Entrez
>>>>> Gene,
>>>>> which has gene-related records of ~2 million genes needed to be
>>>>> integrated with pathway information sources, such as KEGG (Kyoto
>>>>> Encyclopedia for Genes and Genomics). Comparing results across model
>>>>> organisms required homology information provided by the NCBI
>>>>> HomoloGene,
>>>>> containing homology data for several completely sequenced eukaryotic
>>>>> organisms).
>>>>>
>>>>> An ontology-driven approach was used to integrate the two gene
>>>>> resources
>>>>> (Entrez Gene and HomoloGene) and the three pathway resources (KEGG,
>>>>> Reactome and BioCyc). An OWL ontology called the Entrez Knowledge
>>>>> Model
>>>>> (EKoM) was created for the gene resources and integrated with the
>>>>> extant
>>>>> BioPAX ontology designed for pathway resources. The integrated schema
>>>>> was populated with data from the pathway resources, publicly available
>>>>> in BioPAX-compatible format, and gene resources for which a population
>>>>> procedure was created.
>>>>>
>>>>> SPARQL was used to formulate queries to investigate the genetic
>>>>> basis of
>>>>> nicotine dependence over the integrated knowledge base:
>>>>>
>>>>>     * Which genes participate in a large number of pathways?
>>>>>     * Identify "hub genes" from the perspective of gene interaction?
>>>>>     * Which genes are expressed in the brain, in the context of
>>>>>       neurobiology of nicotine dependence and various
>>>>> neurotransmitters
>>>>>       in the central nervous system?
>>>>>
>>>>> The result was very successful. The queries could easily identify hub
>>>>> genes, i.e., those genes whose gene products participate in many
>>>>> pathways or interact with many other gene products. See
>>>>> [NicotineDependence] <#> for details.
>>>>>
>>>>>
>>>>>         1.1.2 Triplify: Exposing Relational Data on the Web
>>>>>
>>>>> In order to make the Semantic Web useful to ordinary Web users, RDF
>>>>> and
>>>>> OWL have to be deployed on the Web on a much larger scale. Web
>>>>> applications such as Content Management Systems, online shops or
>>>>> community applications (e.g. Wikis, Blogs, Fora) already store their
>>>>> data in relational databases [Triplify] <#TriplifyPaper>. Providing a
>>>>> standardized way to map the relational data structures behind these
>>>>> Web
>>>>> applications into RDF, RDF-Schema and OWL will facilitate broad
>>>>> penetration and enrich the Web with RDF data and ontologies and
>>>>> facilitate novel semantic browsing and search applications.
>>>>>
>>>>> By supporting the long tail of Web applications and thus counteracting
>>>>> the centralization of the Web 2.0 applications the planned RDB2RDF
>>>>> standardization will help to give control over data back to end-users
>>>>> and thus promote a democratization of the Web.
>>>>>
>>>>> To support this usecase scenario, the mapping language should be
>>>>> easily
>>>>> implementable for lightweight Web applications and have a shallow
>>>>> learning curve to foster early adoption by Web developers.
>>>>>
>>>>>
>>>>>         1.1.3 Integration of Enterprise Information Systems
>>>>>
>>>>> Efficient information and data exchange between application systems
>>>>> within and across enterprises is of paramount importance in the
>>>>> increasingly networked and IT-dominated business atmosphere. Existing
>>>>> Enterprise Information Systems such as CRM, CMS and ERP systems use
>>>>> Relational database backends for persistence. RDF and Linked Data can
>>>>> provide data exchange and integration interfaces for such application
>>>>> systems, which are easy to implement and use, especially in settings
>>>>> where a loose and flexible coupling of the systems is required.
>>>>>
>>>>> Insight can often be gained by integrating data from databses built
>>>>> for
>>>>> different purposes in separate corporate silos. For example,
>>>>> integrating
>>>>> data from a bug database with a customer database may help understand
>>>>> ordering behavior as a function of the bugs encountered.
>>>>>
>>>>> In Supply Chain Management (SCM), for example, it is vital to exchange
>>>>> product catalogs and other goods related information within a
>>>>> network of
>>>>> interconnected businesses involved in the ultimate provision of
>>>>> product
>>>>> and service packages. Such information is stored in relational
>>>>> databases
>>>>> and sometimes already exchanged electronically, but a variety of
>>>>> different technologies are used (e.g. proprietary files, XML files, DB
>>>>> dumps, Web Services etc.). Realizing a completely electronic
>>>>> information
>>>>> flow requires significant initial investments and currently limits the
>>>>> flexibility of businesses (e.g. with regard to changes in business
>>>>> partners). The envisioned RDB2RDF mapping language applied in
>>>>> conjunction with existing RDB based SCM systems will support the
>>>>> use of
>>>>> RDF and unique identifiers for realizing flexible information
>>>>> information flows accompanying supply chains.
>>>>>
>>>>> The mapping language to be standardized by the proposed WG will
>>>>> simplify
>>>>> the publishing of enterprise data and information from Relational data
>>>>> backends and, thus, facilitate the interlinking and exchange of
>>>>> information between business information systems. In this scenario
>>>>> on-demand transformation of relational data to RDF, scalability and
>>>>> completeness with regard to the relational algebra are central
>>>>> requirements.
>>>>>
>>>>>
>>>>>         1.1.4 Ordnance Survey Use Case
>>>>>
>>>>> Ordnance Survey, the National mapping agency of the UK, operates a
>>>>> very
>>>>> large geographical information system based on Oracle Spatial. The
>>>>> database contains topographical features, soil type and land use
>>>>> information. All these types of information are independently
>>>>> maintained
>>>>> and use separate terminologies. They describe the same land area
>>>>> but the
>>>>> boundaries of objects utilized for representing land use and soil type
>>>>> and topography do not coincide: For example, a pasture might
>>>>> consist of
>>>>> two distinct types of soil.
>>>>>
>>>>> An example of a need to integrate this information is modeling
>>>>> filtration of pollutants into water bodies from agricultural land. The
>>>>> soil type determines the degree of filtration, the land use determines
>>>>> the type of pollutant. Topography determines whether the field is next
>>>>> to a water body.
>>>>>
>>>>> An ontology exists for describing the types of objects in each
>>>>> database.
>>>>> The benefit from mapping the data to RDF is in simplifying querying
>>>>> and
>>>>> integration of the data. The very high volume of data makes an ETL
>>>>> approach impracticable, besides, the Oracle Spatial database offers
>>>>> spatial joining which is generally not available on RDF stores.
>>>>>
>>>>> Thus, it is necessary to take SPARQL queries expressed in terms of the
>>>>> land use, soil type and topography ontologies and convert them into
>>>>> single SQL statements, with all joining and filtering to take place at
>>>>> the relational database. In the process, high level concepts need
>>>>> to be
>>>>> translated into SQL conditions on data that is not readily human
>>>>> readable.
>>>>>
>>>>> Business questions to be answered by the use case are for example:
>>>>>
>>>>>     * What is the total length of river bank bordered by permeable
>>>>> soil
>>>>>       used for grazing along a certain river?
>>>>>     * What types of crops are being cultivated within 100m of water,
>>>>>       with total land use grouped by crop.
>>>>>     * What watter bodies are subject to high environmental load from
>>>>>       agriculture, as defined by little current and extensive use of
>>>>>       adjacent land.
>>>>>
>>>>> From the viewpoint of RDB to RDF mapping, this usecase highlights the
>>>>> need to integrate data from different databases, built for different
>>>>> purposes. It also emphasizes need for extensibility in the mapping
>>>>> language for supporting RDBMS vendor specific features. In the present
>>>>> case, Oracle expresses a spatial join using a special type of derived
>>>>> table not found in standard SQL, thus the customization need is deeper
>>>>> than just supporting calls to native SQL functions.
>>>>>
>>>>> The inference requirement consists primarily of expanding class
>>>>> membership into and's and or's of conditions on the relational
>>>>> data. In
>>>>> some cases, these conditions are spatial, such as bordering on or
>>>>> contained in. The user should be familiar with the ontologies but
>>>>> should
>>>>> not have to know about the classification codes used in the databases.
>>>>>
>>>>>
>>>>>       1.2 Liaisons
>>>>>
>>>>> The WG must track the evolution of SPARQL and liaise with the DAWG
>>>>> WG as
>>>>> well as the OWL WG. The proposed WG will also keep track of work on
>>>>> assigning unique identifiers to well-known entities such as the ENS
>>>>> system associated with the OKKAM project [OKKAM] <#okkam> and the
>>>>> Common
>>>>> Naming Project started by Neuro Commons [Common Naming Project]
>>>>> <#CommonNaming>
>>>>>
>>>>>
>>>>>       1.3 Starting Points
>>>>>
>>>>> The WG will take as its starting point the mapping languages developed
>>>>> by the [D2RQ] <#D2RQ> and [Virtuoso] <#Virtuoso> efforts.
>>>>>
>>>>>
>>>>>     2 References
>>>>>
>>>>> Common Naming Project
>>>>>     Neuro Commons Common Naming Project
>>>>>     <http://neurocommons.org/page/Common_Naming_Project>, Science
>>>>>     Commons, Sept 17, 2008. (See
>>>>>     http://neurocommons.org/page/Common_Naming_Project.)
>>>>> D2RQ
>>>>>     The D2RQ Platform v0.5.1, User Manual and Language Specification
>>>>>     <http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/>, Chris Bizer,
>>>>>     Richard Cyganiak, Jorg Garbers, Oliver Maresch (See
>>>>>     http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/.)
>>>>> RIF
>>>>>     W3C Rule Interchange Format Working Group
>>>>>     <http://www.w3.org/2005/rules/wiki/RIF_Working_Group> (See
>>>>>     http://www.w3.org/2005/rules/wiki/RIF_Working_Group.)
>>>>> LinkedData
>>>>>     Design Issues for Linked Data
>>>>>     <http://www.w3.org/DesignIssues/LinkedData.html>, Tim Berners-Lee
>>>>>     (See http://www.w3.org/DesignIssues/LinkedData.html.)
>>>>> StateOfArt
>>>>>     Mapping Relational Data to RDF and OWL: A Literature Survey
>>>>>     <http://esw.w3.org/topic/Rdb2RdfXG/>, Satya Sahoo, Wolfgang Halb
>>>>>     (See http://esw.w3.org/topic/Rdb2RdfXG/.)
>>>>> OKKAM
>>>>>     An Entity Name System (ENS) for the Semantic Web
>>>>>     <http://www.okkam.org/>, Paolo Bouquet, Heiko Stoermer, Barbara
>>>>>     Bazzanella, January 2008. (See http://www.okkam.org/.)
>>>>> Virtuoso
>>>>>     Virtuoso Open-Source Edition
>>>>>     <http://virtuoso.openlinksw.com/wiki/main/Main/> (See
>>>>>     http://virtuoso.openlinksw.com/wiki/main/Main/.)
>>>>> Triplify
>>>>>     Triplify - Lightweight Linked Data Publication from Relational
>>>>>     Databases, submitted to WWW 2009
>>>>>   
>>>>> <http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf>Auer,
>>>>>
>>>>>
>>>>>     Dietzold, Lehmann, Hellmann, Aumueller (See
>>>>>   
>>>>> http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf.)
>>>>> NicoteneDependence
>>>>>     An ontology-driven semantic mashup of gene and biological pathway
>>>>>     information: Application to the domain of nicotine dependence
>>>>>     <http://dx.doi.org/10.1016/j.jbi.2008.02.006 >Satya S. Sahoo,
>>>>>     Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner and Amit P.
>>>>>     Shetha (See http://dx.doi.org/10.1016/j.jbi.2008.02.006 .)
>>>>>             
>>>>         
>>
>>   

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Monday, 26 January 2009 16:24:45 UTC