- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Tue, 06 Jan 2009 09:13:43 -0500
- To: Li L Ma <malli@cn.ibm.com>
- CC: "Ezzat, Ahmed" <Ahmed.Ezzat@hp.com>, "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>, public-xg-rdb2rdf-request@w3.org
On 1/6/09 1:10 AM, Li L Ma wrote: > > Hi Ezzat and all, > > Happy New Year! > > I agreed with your comments on RDF for MDM integration. So far, I also > did not see the effective use of RDF for master data integration. > Here, I'd like to share our research work on using linked data > techniques for master data management. The following picture shows our > high level ideas. We created a core ontology from a MDM logical model, > as well as a mapping (defined by D2RQ) between the created ontology > and MDM data stored in relational databases. That means we can have an > RDF view to existing master data. Furthermore, the published master > data could be linked/mapped to domain ontologies by rules. Once master > data is mapped to ontologies, users can define their own business > rules using classes and properties defined in core MDM and domain > ontologies and issue SPARQL queries including defined rules to MDM > databases. Our developed SeDA engine, which takes as input SPARQL > query, D2RQ mapping, ontolgies and user-defined business rules, can > translate a SPARQL query to a single SQL query to retrieve master > data. In summary, using some linked data technologies (mainly mapping > and reasoning), we provided advanced analytics services over > centralized master data, but NOT focusing on the integration problem > in MDM. An interesting problem to explore in the future is to use > linked data technologies for the registry based MDM implementation. > > *Besides integration, I think, another significant value of RDB2RDF > mapping for enterprise data management is to enable ontology and rule > reasoning for analytics services. So, I think reasoning is also > important for both ETL and on demand mapping approaches.* Li, Nicely articulated. Platform independent Entity-Attribute-Value + Classes & Relationships + HTTP Identifiers (aka. RDF based Linked) is a killer solution for MDM. Rdb2RDF mapping is the cost-effective route for implementation. If we interlink the value propositions of MDM and Rdb2Rdf we have a no-brainer style foundation for answer the question: What does Rdb2Rdf deliver over and above SQL? Ashok: As you recall, we had a quick chat about MDM at the last Semantic Web Gathering at MIT. And if you go back to my demo links, all I am showing is how you can demonstrate MDM in the very simplest terms using existing demo databases for all the major RDBMS engines. > > > > For "the Union Bomb", I think Orri proposed it from implementation > perspective. Compared with ETL approach, on demand mapping has higher > requirments for performance and scalability. If a mapping can provide > clues for query optimization, SPARQL-to-SQL engines can generate more > efficient SQL statements. Yes, which is basically what I believe we [OpenLink] have been trying to articulate for a very long time :-) Kingsley > > Best Regards, > > Li MA, Ph.D > Manager, Semantic Technologies > IBM China Research Lab > TEL: 86-10-58748078 > T/L: 11905 ext. 8078 > FAX: 86-10-58748731 > E-Mail: MaLLi@cn.ibm.com > Homepage: http://www.research.ibm.com/people/m/mali > > > *"Ezzat, Ahmed" <Ahmed.Ezzat@hp.com>* > Sent by: public-xg-rdb2rdf-request@w3.org > > 2009-01-03 11:31 > > > To > "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org> > cc > > Subject > Requirements for Relational to RDF mapping document comment... > > > > > > > > > > > Hello, > > Good paper. Below are couple of comments: > > 1. _Motivation_: > >> RDF offers a systematic ontology and query language which can be > used against the mapped data, without concern of the semantic > heterogeneity inherent in independently arisen relational databases. > > AKE> How about all kind of issues between mapping rdbms schema to the > application/domain ontology? As well as reconciling the different > local ontologies. Even with only RDBMS data sources, having unified > consistent and complete single data view (MDM) is a lot of work. RDF > does not help in any of these issues. The author might want to > rewrite/soften this sentence. > > 2. _Relative Desirability of Mapping and ETL_: > >> We expect cases favoring ETL to be characterized by: > > * Large number of heterogeneous sources of data > * Complex application logic needed for transforming the data > * DRF reasoning being performed on the mapped data > * Queries with variable in class or predicate positions > > > AKE> Agree on some, not on some, and there are missing bullets. For > example, I am not sure why ETL (dump approach) is preferred with large > number of data sources? I suspect a main factor is scalability, i.e., > if the overall aggregate data size from the large number of data > sources is hard to accommodate in a single RDF store; it might force > you to querying the data sources rather than translating all of them > into an RDF store then querying. *Second* factor is the dynamic nature > of the data sources; with highly dynamic content I suspect querying > the data sources is better. > > 3. Mapping on Demand: > >> The Union Bomb: > > AKE> I am not sure why this is a problem under ˇ°Mapping on Demand?ˇ± It > seems to be the same issues between ETL and on-demand mapping. > P.S. Most EIS (CRM like Vignette or ERP like SAP) servers do not > expose SQL interface. I assume you meant conceptually in the context > of multiple databases. > > 4. Criteria of Success: > >> At the endˇ. There should exist at least two interoperable > implementations of the mapping language providing at least ETL. Aside > from this, implementers are encouraged to support on-demand mapping. > > AKE> No, we need to support both approaches as 1^st class citizens. I > view on-demand as more critical as having minimal cost for translation > is more critical than the ETL approach. Second, with either ETL or > on-demand we need to have a proof of concept or recommendation for how > to reconcile RDF sub-graphs out of multiple data sources into a single > domain ontology. > Regards, > > Ahmed > > > Ahmed K. Ezzat, Ph.D. > HP Fellow, Business Intelligence Software Division* > Hewlett-Packard Corporation * > 11000 Wolf Road, Bldg 42 Upper, MS 4502, Cupertino, CA 95014-0691 * > Office*: *Email*: _Ahmed.Ezzat@hp.com_ <mailto:Ahmed.Ezzat@hp.com> > *Tel*: 408-447-6380 *Fax*: 1408796-5427 *Cell*: 408-504-2603* > Personal*: *Email*: _AhmedEzzat@aol.com_ <mailto:AhmedEzzat@aol.com> > *Tel*: 408-253-5062 *Fax*: 408-253-6271 > > > > -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Received on Tuesday, 6 January 2009 14:14:23 UTC