RE: Requirements for Relational to RDF mapping document comment... from Ezzat, Ahmed on 2009-01-06 (public-xg-rdb2rdf@w3.org from January 2009)

From: Ezzat, Ahmed <Ahmed.Ezzat@hp.com>
Date: Tue, 6 Jan 2009 19:54:18 +0000
To: "ashok.malhotra@oracle.com" <ashok.malhotra@oracle.com>, Li L Ma <malli@cn.ibm.com>
CC: "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>, "public-xg-rdb2rdf-request@w3.org" <public-xg-rdb2rdf-request@w3.org>
Message-ID: <3B7AE9BA67C72B4891EF21842246A21C404A219C25@GVW1097EXB.americas.hpqcorp.net>
Hello Li Ma and Ashok,

At the requirement level, we should avoid specifying how we can implement things.
I can see a case for using RDF & RDB2RDF as critical components to solve the MDM problem in the context of data integration.  Advantages over traditional MDM would be it is simpler and more efficient from semantics point of view; I assume that what Li Ma meant.

What Li Ma and Kingsley suggested seperately is reasonable.  There are companies who provide solution to this problem like Metatomix today as one example.

What is critical is to be careful and not to imply that RDF or RDB2RDF solve these problems but it is a critical technology to enable higher-level problem solution and along these lines where I think MDM and data integration come in the picture; MDM is not orthognal issue to data integration.

The other aspect which hope not to be lost is the importance of on-demand approach and to be treaed as 1st class citizen.

Ahmed

Ahmed K. Ezzat, Ph.D.
HP Fellow, Business Intelligence Software Division
Hewlett-Packard Corporation
11000 Wolf Road, Bldg 42 Upper, MS 4502, Cupertino, CA 95014-0691
Office:      Email: Ahmed.Ezzat@hp.com<mailto:Ahmed.Ezzat@hp.com> Off: 408-447-6380  Fax: 1408796-5427  Cell: 408-504-2603
Personal: Email: AhmedEzzat@aol.com<mailto:AhmedEzzat@aol.com> Tel: 408-253-5062  Fax:  408-253-6271


-----Original Message-----
From: ashok malhotra [mailto:ashok.malhotra@oracle.com]
Sent: Tuesday, January 06, 2009 7:11 AM
To: Li L Ma
Cc: Ezzat, Ahmed; public-xg-rdb2rdf@w3.org; public-xg-rdb2rdf-request@w3.org
Subject: Re: Requirements for Relational to RDF mapping document comment...

Hello Li Ma:
You said...

*> Besides integration, I think, another significant value of RDB2RDF
mapping for
> enterprise data management is to enable ontology and rule reasoning
for analytics > services.

I agree. It would be good to add a usecase that enables rules and reasoning.
Could you amplify your discussion below with an example of a rule or
some sort of reasoning that in enabled by the mapping to the Semantic
Web? We can then add this as a third usecase.

Thanks!

*
All the best, Ashok


Li L Ma wrote:
>
> Hi Ezzat and all,
>
> Happy New Year!
>
> I agreed with your comments on RDF for MDM integration. So far, I also
> did not see the effective use of RDF for master data integration.
> Here, I'd like to share our research work on using linked data
> techniques for master data management. The following picture shows our
> high level ideas. We created a core ontology from a MDM logical model,
> as well as a mapping (defined by D2RQ) between the created ontology
> and MDM data stored in relational databases. That means we can have an
> RDF view to existing master data. Furthermore, the published master
> data could be linked/mapped to domain ontologies by rules. Once master
> data is mapped to ontologies, users can define their own business
> rules using classes and properties defined in core MDM and domain
> ontologies and issue SPARQL queries including defined rules to MDM
> databases. Our developed SeDA engine, which takes as input SPARQL
> query, D2RQ mapping, ontolgies and user-defined business rules, can
> translate a SPARQL query to a single SQL query to retrieve master
> data. In summary, using some linked data technologies (mainly mapping
> and reasoning), we provided advanced analytics services over
> centralized master data, but NOT focusing on the integration problem
> in MDM. An interesting problem to explore in the future is to use
> linked data technologies for the registry based MDM implementation.
>
> *Besides integration, I think, another significant value of RDB2RDF
> mapping for enterprise data management is to enable ontology and rule
> reasoning for analytics services. So, I think reasoning is also
> important for both ETL and on demand mapping approaches.*
>
>
>
> For "the Union Bomb", I think Orri proposed it from implementation
> perspective. Compared with ETL approach, on demand mapping has higher
> requirments for performance and scalability. If a mapping can provide
> clues for query optimization, SPARQL-to-SQL engines can generate more
> efficient SQL statements.
>
> Best Regards,
>
> Li MA, Ph.D
> Manager, Semantic Technologies
> IBM China Research Lab
> TEL: 86-10-58748078
> T/L: 11905 ext. 8078
> FAX: 86-10-58748731
> E-Mail: MaLLi@cn.ibm.com
> Homepage: http://www.research.ibm.com/people/m/mali
>
>
> *"Ezzat, Ahmed" <Ahmed.Ezzat@hp.com>*
> Sent by: public-xg-rdb2rdf-request@w3.org
>
> 2009-01-03 11:31
>
>
> To
>       "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>
> cc
>
> Subject
>       Requirements for Relational to RDF mapping document comment...
>
>
>
>
>
>
>
>
>
>
> Hello,
>
> Good paper. Below are couple of comments:
>
> 1. _Motivation_:
> >> RDF offers a systematic ontology and query language which can be
> used against the mapped data, without concern of the semantic
> heterogeneity inherent in independently arisen relational databases.
>
> AKE> How about all kind of issues between mapping rdbms schema to the
> application/domain ontology? As well as reconciling the different
> local ontologies. Even with only RDBMS data sources, having unified
> consistent and complete single data view (MDM) is a lot of work. RDF
> does not help in any of these issues. The author might want to
> rewrite/soften this sentence.
>
> 2. _Relative Desirability of Mapping and ETL_:
> >> We expect cases favoring ETL to be characterized by:
>
>     * Large number of heterogeneous sources of data
>     * Complex application logic needed for transforming the data
>     * DRF reasoning being performed on the mapped data
>     * Queries with variable in class or predicate positions
>
>
> AKE> Agree on some, not on some, and there are missing bullets. For
> example, I am not sure why ETL (dump approach) is preferred with large
> number of data sources? I suspect a main factor is scalability, i.e.,
> if the overall aggregate data size from the large number of data
> sources is hard to accommodate in a single RDF store; it might force
> you to querying the data sources rather than translating all of them
> into an RDF store then querying. *Second* factor is the dynamic nature
> of the data sources; with highly dynamic content I suspect querying
> the data sources is better.
>
> 3. Mapping on Demand:
> >> The Union Bomb:
>
> AKE> I am not sure why this is a problem under "Mapping on Demand?" It
> seems to be the same issues between ETL and on-demand mapping.
> P.S. Most EIS (CRM like Vignette or ERP like SAP) servers do not
> expose SQL interface. I assume you meant conceptually in the context
> of multiple databases.
>
> 4. Criteria of Success:
> >> At the end.... There should exist at least two interoperable
> implementations of the mapping language providing at least ETL. Aside
> from this, implementers are encouraged to support on-demand mapping.
>
> AKE> No, we need to support both approaches as 1^st class citizens. I
> view on-demand as more critical as having minimal cost for translation
> is more critical than the ETL approach. Second, with either ETL or
> on-demand we need to have a proof of concept or recommendation for how
> to reconcile RDF sub-graphs out of multiple data sources into a single
> domain ontology.
> Regards,
>
> Ahmed
>
>
> Ahmed K. Ezzat, Ph.D.
> HP Fellow, Business Intelligence Software Division*
> Hewlett-Packard Corporation *
> 11000 Wolf Road, Bldg 42 Upper, MS 4502, Cupertino, CA 95014-0691 *
> Office*: *Email*: _Ahmed.Ezzat@hp.com_ <mailto:Ahmed.Ezzat@hp.com>
> *Tel*: 408-447-6380 *Fax*: 1408796-5427 *Cell*: 408-504-2603*
> Personal*: *Email*: _AhmedEzzat@aol.com_ <mailto:AhmedEzzat@aol.com>
> *Tel*: 408-253-5062 *Fax*: 408-253-6271
>
>
>
>
Received on Tuesday, 6 January 2009 19:55:50 UTC