Re: Requirements for Relational to RDF mapping document comment...

On 1/6/09 1:10 AM, Li L Ma wrote:
>
> Hi Ezzat and all,
>
> Happy New Year!
>
> I agreed with your comments on RDF for MDM integration. So far, I also
> did not see the effective use of RDF for master data integration.
> Here, I'd like to share our research work on using linked data
> techniques for master data management. The following picture shows our
> high level ideas. We created a core ontology from a MDM logical model,
> as well as a mapping (defined by D2RQ) between the created ontology
> and MDM data stored in relational databases. That means we can have an
> RDF view to existing master data. Furthermore, the published master
> data could be linked/mapped to domain ontologies by rules. Once master
> data is mapped to ontologies, users can define their own business
> rules using classes and properties defined in core MDM and domain
> ontologies and issue SPARQL queries including defined rules to MDM
> databases. Our developed SeDA engine, which takes as input SPARQL
> query, D2RQ mapping, ontolgies and user-defined business rules, can
> translate a SPARQL query to a single SQL query to retrieve master
> data. In summary, using some linked data technologies (mainly mapping
> and reasoning), we provided advanced analytics services over
> centralized master data, but NOT focusing on the integration problem
> in MDM. An interesting problem to explore in the future is to use
> linked data technologies for the registry based MDM implementation.
>
> *Besides integration, I think, another significant value of RDB2RDF
> mapping for enterprise data management is to enable ontology and rule
> reasoning for analytics services. So, I think reasoning is also
> important for both ETL and on demand mapping approaches.*
Li,

Nicely articulated.

Platform independent Entity-Attribute-Value + Classes & Relationships +
HTTP Identifiers (aka. RDF based Linked) is a killer solution for MDM.

Rdb2RDF mapping is the cost-effective route for implementation.

If we interlink the value propositions of MDM and Rdb2Rdf we have a
no-brainer style foundation for answer the question:
What does Rdb2Rdf deliver over and above SQL?

Ashok: As you recall, we had a quick chat about MDM at the last Semantic
Web Gathering at MIT. And if you go back to my demo links, all I am
showing is how you can demonstrate MDM in the very simplest terms using
existing demo databases for all the major RDBMS engines.


>
>
>
> For "the Union Bomb", I think Orri proposed it from implementation
> perspective. Compared with ETL approach, on demand mapping has higher
> requirments for performance and scalability. If a mapping can provide
> clues for query optimization, SPARQL-to-SQL engines can generate more
> efficient SQL statements.
Yes, which is basically what I believe we [OpenLink] have been trying to
articulate for a very long time :-)

Kingsley
>
> Best Regards,
>
> Li MA, Ph.D
> Manager, Semantic Technologies
> IBM China Research Lab
> TEL: 86-10-58748078
> T/L: 11905 ext. 8078
> FAX: 86-10-58748731
> E-Mail: MaLLi@cn.ibm.com
> Homepage: http://www.research.ibm.com/people/m/mali
>
>
> *"Ezzat, Ahmed" <Ahmed.Ezzat@hp.com>*
> Sent by: public-xg-rdb2rdf-request@w3.org
>
> 2009-01-03 11:31
>
> 	
> To
> 	"public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>
> cc
> 	
> Subject
> 	Requirements for Relational to RDF mapping document comment...
>
>
>
> 	
>
>
>
>
>
>
> Hello,
>
> Good paper. Below are couple of comments:
>
> 1. _Motivation_:
> >> RDF offers a systematic ontology and query language which can be
> used against the mapped data, without concern of the semantic
> heterogeneity inherent in independently arisen relational databases.
>
> AKE> How about all kind of issues between mapping rdbms schema to the
> application/domain ontology? As well as reconciling the different
> local ontologies. Even with only RDBMS data sources, having unified
> consistent and complete single data view (MDM) is a lot of work. RDF
> does not help in any of these issues. The author might want to
> rewrite/soften this sentence.
>
> 2. _Relative Desirability of Mapping and ETL_:
> >> We expect cases favoring ETL to be characterized by:
>
>     * Large number of heterogeneous sources of data
>     * Complex application logic needed for transforming the data
>     * DRF reasoning being performed on the mapped data
>     * Queries with variable in class or predicate positions
>
>
> AKE> Agree on some, not on some, and there are missing bullets. For
> example, I am not sure why ETL (dump approach) is preferred with large
> number of data sources? I suspect a main factor is scalability, i.e.,
> if the overall aggregate data size from the large number of data
> sources is hard to accommodate in a single RDF store; it might force
> you to querying the data sources rather than translating all of them
> into an RDF store then querying. *Second* factor is the dynamic nature
> of the data sources; with highly dynamic content I suspect querying
> the data sources is better.
>
> 3. Mapping on Demand:
> >> The Union Bomb:
>
> AKE> I am not sure why this is a problem under ˇ°Mapping on Demand?ˇ± It
> seems to be the same issues between ETL and on-demand mapping.
> P.S. Most EIS (CRM like Vignette or ERP like SAP) servers do not
> expose SQL interface. I assume you meant conceptually in the context
> of multiple databases.
>
> 4. Criteria of Success:
> >> At the endˇ­. There should exist at least two interoperable
> implementations of the mapping language providing at least ETL. Aside
> from this, implementers are encouraged to support on-demand mapping.
>
> AKE> No, we need to support both approaches as 1^st class citizens. I
> view on-demand as more critical as having minimal cost for translation
> is more critical than the ETL approach. Second, with either ETL or
> on-demand we need to have a proof of concept or recommendation for how
> to reconcile RDF sub-graphs out of multiple data sources into a single
> domain ontology.
> Regards,
>
> Ahmed
>
>
> Ahmed K. Ezzat, Ph.D.
> HP Fellow, Business Intelligence Software Division*
> Hewlett-Packard Corporation *
> 11000 Wolf Road, Bldg 42 Upper, MS 4502, Cupertino, CA 95014-0691 *
> Office*: *Email*: _Ahmed.Ezzat@hp.com_ <mailto:Ahmed.Ezzat@hp.com>
> *Tel*: 408-447-6380 *Fax*: 1408796-5427 *Cell*: 408-504-2603*
> Personal*: *Email*: _AhmedEzzat@aol.com_ <mailto:AhmedEzzat@aol.com>
> *Tel*: 408-253-5062 *Fax*: 408-253-6271
>
>
>
>


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com

Received on Tuesday, 6 January 2009 14:14:23 UTC