- From: Li L Ma <malli@cn.ibm.com>
- Date: Tue, 6 Jan 2009 14:10:15 +0800
- To: "Ezzat, Ahmed" <Ahmed.Ezzat@hp.com>
- Cc: "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>, public-xg-rdb2rdf-request@w3.org
- Message-ID: <OFD3773092.17400335-ON48257536.0014FAAA-48257536.0021C0B4@cn.ibm.com>
Hi Ezzat and all, Happy New Year! I agreed with your comments on RDF for MDM integration. So far, I also did not see the effective use of RDF for master data integration. Here, I'd like to share our research work on using linked data techniques for master data management. The following picture shows our high level ideas. We created a core ontology from a MDM logical model, as well as a mapping (defined by D2RQ) between the created ontology and MDM data stored in relational databases. That means we can have an RDF view to existing master data. Furthermore, the published master data could be linked/mapped to domain ontologies by rules. Once master data is mapped to ontologies, users can define their own business rules using classes and properties defined in core MDM and domain ontologies and issue SPARQL queries including defined rules to MDM databases. Our developed SeDA engine, which takes as input SPARQL query, D2RQ mapping, ontolgies and user-defined business rules, can translate a SPARQL query to a single SQL query to retrieve master data. In summary, using some linked data technologies (mainly mapping and reasoning), we provided advanced analytics services over centralized master data, but NOT focusing on the integration problem in MDM. An interesting problem to explore in the future is to use linked data technologies for the registry based MDM implementation. Besides integration, I think, another significant value of RDB2RDF mapping for enterprise data management is to enable ontology and rule reasoning for analytics services. So, I think reasoning is also important for both ETL and on demand mapping approaches. For "the Union Bomb", I think Orri proposed it from implementation perspective. Compared with ETL approach, on demand mapping has higher requirments for performance and scalability. If a mapping can provide clues for query optimization, SPARQL-to-SQL engines can generate more efficient SQL statements. Best Regards, Li MA, Ph.D Manager, Semantic Technologies IBM China Research Lab TEL: 86-10-58748078 T/L: 11905 ext. 8078 FAX: 86-10-58748731 E-Mail: MaLLi@cn.ibm.com Homepage: http://www.research.ibm.com/people/m/mali "Ezzat, Ahmed" <Ahmed.Ezzat@hp.com> Sent by: public-xg-rdb2rdf-request@w3.org 2009-01-03 11:31 To "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org> cc Subject Requirements for Relational to RDF mapping document comment... Hello, Good paper. Below are couple of comments: 1. Motivation: >> RDF offers a systematic ontology and query language which can be used against the mapped data, without concern of the semantic heterogeneity inherent in independently arisen relational databases. AKE> How about all kind of issues between mapping rdbms schema to the application/domain ontology? As well as reconciling the different local ontologies. Even with only RDBMS data sources, having unified consistent and complete single data view (MDM) is a lot of work. RDF does not help in any of these issues. The author might want to rewrite/soften this sentence. 2. Relative Desirability of Mapping and ETL: >> We expect cases favoring ETL to be characterized by: Large number of heterogeneous sources of data Complex application logic needed for transforming the data DRF reasoning being performed on the mapped data Queries with variable in class or predicate positions AKE> Agree on some, not on some, and there are missing bullets. For example, I am not sure why ETL (dump approach) is preferred with large number of data sources? I suspect a main factor is scalability, i.e., if the overall aggregate data size from the large number of data sources is hard to accommodate in a single RDF store; it might force you to querying the data sources rather than translating all of them into an RDF store then querying. Second factor is the dynamic nature of the data sources; with highly dynamic content I suspect querying the data sources is better. 3. Mapping on Demand: >> The Union Bomb: AKE> I am not sure why this is a problem under ˇ°Mapping on Demand?ˇ± It seems to be the same issues between ETL and on-demand mapping. P.S. Most EIS (CRM like Vignette or ERP like SAP) servers do not expose SQL interface. I assume you meant conceptually in the context of multiple databases. 4. Criteria of Success: >> At the endˇ. There should exist at least two interoperable implementations of the mapping language providing at least ETL. Aside from this, implementers are encouraged to support on-demand mapping. AKE> No, we need to support both approaches as 1st class citizens. I view on-demand as more critical as having minimal cost for translation is more critical than the ETL approach. Second, with either ETL or on-demand we need to have a proof of concept or recommendation for how to reconcile RDF sub-graphs out of multiple data sources into a single domain ontology. Regards, Ahmed Ahmed K. Ezzat, Ph.D. HP Fellow, Business Intelligence Software Division Hewlett-Packard Corporation 11000 Wolf Road, Bldg 42 Upper, MS 4502, Cupertino, CA 95014-0691 Office: Email: Ahmed.Ezzat@hp.com Tel: 408-447-6380 Fax: 1408796-5427 Cell : 408-504-2603 Personal: Email: AhmedEzzat@aol.com Tel: 408-253-5062 Fax: 408-253-6271
Attachments
- image/gif attachment: 01-part
Received on Tuesday, 6 January 2009 06:13:31 UTC