Re: Requirements for Relational to RDF mapping document comment... from Li L Ma on 2009-01-06 (public-xg-rdb2rdf@w3.org from January 2009)

From: Li L Ma <malli@cn.ibm.com>
Date: Tue, 6 Jan 2009 14:10:15 +0800
To: "Ezzat, Ahmed" <Ahmed.Ezzat@hp.com>
Cc: "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>, public-xg-rdb2rdf-request@w3.org
Message-ID: <OFD3773092.17400335-ON48257536.0014FAAA-48257536.0021C0B4@cn.ibm.com>
Hi Ezzat and all,

Happy New Year!

I agreed with your comments on RDF for MDM integration. So far, I also did 
not see the effective use of RDF for master data integration. Here, I'd 
like to share our research work on using linked data techniques for master 
data management. The following picture shows our high level ideas. We 
created a core ontology from a MDM logical model, as well as a mapping 
(defined by D2RQ) between the created ontology and MDM data stored in 
relational databases. That means we can have an RDF view to existing 
master data. Furthermore, the published master data could be linked/mapped 
to domain ontologies by rules. Once master data is mapped to ontologies, 
users can define their own business rules using classes and properties 
defined in core MDM and domain ontologies and issue SPARQL queries 
including defined rules to MDM databases. Our developed SeDA engine, which 
takes as input SPARQL query, D2RQ mapping, ontolgies and user-defined 
business rules, can translate a SPARQL query to a single SQL query to 
retrieve master data. In summary, using some linked data technologies 
(mainly mapping and reasoning), we provided advanced analytics services 
over centralized master data, but NOT focusing on the integration problem 
in MDM. An interesting problem to explore in the future is to use linked 
data technologies for the registry based MDM implementation.

Besides integration, I think, another significant value of RDB2RDF mapping 
for enterprise data management is to enable ontology and rule reasoning 
for analytics services. So, I think reasoning is also important for both 
ETL and on demand mapping approaches.



For "the Union Bomb", I think Orri proposed it from implementation 
perspective. Compared with ETL approach, on demand mapping has higher 
requirments for performance and scalability. If a mapping can provide 
clues for query optimization, SPARQL-to-SQL engines can generate more 
efficient SQL statements. 

Best Regards,

Li MA, Ph.D
Manager, Semantic Technologies
IBM China Research Lab
TEL:   86-10-58748078 
T/L:   11905 ext. 8078
FAX:   86-10-58748731
E-Mail:   MaLLi@cn.ibm.com
Homepage: http://www.research.ibm.com/people/m/mali




"Ezzat, Ahmed" <Ahmed.Ezzat@hp.com> 
Sent by: public-xg-rdb2rdf-request@w3.org
2009-01-03 11:31

To
"public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>
cc

Subject
Requirements for Relational to RDF mapping document comment...






 
Hello,
 
Good paper.  Below are couple of comments:
 
1.      Motivation:
>> RDF offers a systematic ontology and query language which can be used 
against the mapped data, without concern of the semantic heterogeneity 
inherent in independently arisen relational databases.
 
AKE>  How about all kind of issues between mapping rdbms schema to the 
application/domain ontology? As well as reconciling the different local 
ontologies.  Even with only RDBMS data sources, having unified consistent 
and complete single data view (MDM) is a lot of work.   RDF does not help 
in any of these issues.  The author might want to rewrite/soften this 
sentence.
 
2.      Relative Desirability of Mapping and ETL:
>> We expect cases favoring ETL to be characterized by:
Large number of heterogeneous sources of data
Complex application logic needed for transforming the data
DRF reasoning being performed on the mapped data
Queries with variable in class or predicate positions
 
AKE> Agree on some, not on some, and there are missing bullets.   For 
example, I am not sure why ETL (dump approach) is preferred with large 
number of data sources?  I suspect a main factor is scalability, i.e., if 
the overall aggregate data size from the large number of data sources is 
hard to accommodate in a single RDF store; it might force you to querying 
the data sources rather than translating all of them into an RDF store 
then querying.  Second factor is the dynamic nature of the data sources; 
with highly dynamic content I suspect querying the data sources is better.
 
3.      Mapping on Demand:
>> The Union Bomb: 
 
AKE> I am not sure why this is a problem under “Mapping on Demand?”   It 
seems to be the same issues between ETL and on-demand mapping.
P.S. Most EIS (CRM like Vignette or ERP like SAP) servers do not expose 
SQL interface.  I assume you meant conceptually in the context of multiple 
databases.
 
4.      Criteria of Success:
>> At the end…. There should exist at least two interoperable 
implementations of the mapping language providing at least ETL.  Aside 
from this, implementers are encouraged to support on-demand mapping.
 
AKE> No, we need to support both approaches as 1st class citizens.  I view 
on-demand as more critical as having minimal cost for translation is more 
critical than the ETL approach.   Second, with either ETL or on-demand we 
need to have a proof of concept or recommendation for how to reconcile RDF 
sub-graphs out of multiple data sources into a single domain ontology.
Regards,
 
Ahmed
 
 
Ahmed K. Ezzat, Ph.D.
HP Fellow, Business Intelligence Software Division
Hewlett-Packard Corporation 
11000 Wolf Road, Bldg 42 Upper, MS 4502, Cupertino, CA 95014-0691 
Office: Email: Ahmed.Ezzat@hp.com Tel: 408-447-6380 Fax: 1408796-5427 Cell
: 408-504-2603
Personal: Email: AhmedEzzat@aol.com Tel: 408-253-5062 Fax: 408-253-6271
Attachments

image/gif attachment: 01-part
Received on Tuesday, 6 January 2009 06:13:31 UTC