W3C home > Mailing lists > Public > public-xg-rdb2rdf@w3.org > January 2009

Re: Requirements for Relational to RDF mapping document comment...

From: ashok malhotra <ashok.malhotra@oracle.com>
Date: Tue, 06 Jan 2009 07:10:50 -0800
Message-ID: <4963747A.5080502@oracle.com>
To: Li L Ma <malli@cn.ibm.com>
CC: "Ezzat, Ahmed" <Ahmed.Ezzat@hp.com>, "public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>, public-xg-rdb2rdf-request@w3.org

Hello Li Ma:
You said...

*> Besides integration, I think, another significant value of RDB2RDF
mapping for
> enterprise data management is to enable ontology and rule reasoning
for analytics > services.

I agree. It would be good to add a usecase that enables rules and reasoning.
Could you amplify your discussion below with an example of a rule or
some sort of reasoning that in enabled by the mapping to the Semantic
Web? We can then add this as a third usecase.

Thanks!

*
All the best, Ashok


Li L Ma wrote:
>
> Hi Ezzat and all,
>
> Happy New Year!
>
> I agreed with your comments on RDF for MDM integration. So far, I also
> did not see the effective use of RDF for master data integration.
> Here, I'd like to share our research work on using linked data
> techniques for master data management. The following picture shows our
> high level ideas. We created a core ontology from a MDM logical model,
> as well as a mapping (defined by D2RQ) between the created ontology
> and MDM data stored in relational databases. That means we can have an
> RDF view to existing master data. Furthermore, the published master
> data could be linked/mapped to domain ontologies by rules. Once master
> data is mapped to ontologies, users can define their own business
> rules using classes and properties defined in core MDM and domain
> ontologies and issue SPARQL queries including defined rules to MDM
> databases. Our developed SeDA engine, which takes as input SPARQL
> query, D2RQ mapping, ontolgies and user-defined business rules, can
> translate a SPARQL query to a single SQL query to retrieve master
> data. In summary, using some linked data technologies (mainly mapping
> and reasoning), we provided advanced analytics services over
> centralized master data, but NOT focusing on the integration problem
> in MDM. An interesting problem to explore in the future is to use
> linked data technologies for the registry based MDM implementation.
>
> *Besides integration, I think, another significant value of RDB2RDF
> mapping for enterprise data management is to enable ontology and rule
> reasoning for analytics services. So, I think reasoning is also
> important for both ETL and on demand mapping approaches.*
>
>
>
> For "the Union Bomb", I think Orri proposed it from implementation
> perspective. Compared with ETL approach, on demand mapping has higher
> requirments for performance and scalability. If a mapping can provide
> clues for query optimization, SPARQL-to-SQL engines can generate more
> efficient SQL statements.
>
> Best Regards,
>
> Li MA, Ph.D
> Manager, Semantic Technologies
> IBM China Research Lab
> TEL: 86-10-58748078
> T/L: 11905 ext. 8078
> FAX: 86-10-58748731
> E-Mail: MaLLi@cn.ibm.com
> Homepage: http://www.research.ibm.com/people/m/mali
>
>
> *"Ezzat, Ahmed" <Ahmed.Ezzat@hp.com>*
> Sent by: public-xg-rdb2rdf-request@w3.org
>
> 2009-01-03 11:31
>
> 	
> To
> 	"public-xg-rdb2rdf@w3.org" <public-xg-rdb2rdf@w3.org>
> cc
> 	
> Subject
> 	Requirements for Relational to RDF mapping document comment...
>
>
>
> 	
>
>
>
>
>
>
> Hello,
>
> Good paper. Below are couple of comments:
>
> 1. _Motivation_:
> >> RDF offers a systematic ontology and query language which can be
> used against the mapped data, without concern of the semantic
> heterogeneity inherent in independently arisen relational databases.
>
> AKE> How about all kind of issues between mapping rdbms schema to the
> application/domain ontology? As well as reconciling the different
> local ontologies. Even with only RDBMS data sources, having unified
> consistent and complete single data view (MDM) is a lot of work. RDF
> does not help in any of these issues. The author might want to
> rewrite/soften this sentence.
>
> 2. _Relative Desirability of Mapping and ETL_:
> >> We expect cases favoring ETL to be characterized by:
>
>     * Large number of heterogeneous sources of data
>     * Complex application logic needed for transforming the data
>     * DRF reasoning being performed on the mapped data
>     * Queries with variable in class or predicate positions
>
>
> AKE> Agree on some, not on some, and there are missing bullets. For
> example, I am not sure why ETL (dump approach) is preferred with large
> number of data sources? I suspect a main factor is scalability, i.e.,
> if the overall aggregate data size from the large number of data
> sources is hard to accommodate in a single RDF store; it might force
> you to querying the data sources rather than translating all of them
> into an RDF store then querying. *Second* factor is the dynamic nature
> of the data sources; with highly dynamic content I suspect querying
> the data sources is better.
>
> 3. Mapping on Demand:
> >> The Union Bomb:
>
> AKE> I am not sure why this is a problem under ˇ°Mapping on Demand?ˇ± It
> seems to be the same issues between ETL and on-demand mapping.
> P.S. Most EIS (CRM like Vignette or ERP like SAP) servers do not
> expose SQL interface. I assume you meant conceptually in the context
> of multiple databases.
>
> 4. Criteria of Success:
> >> At the endˇ­. There should exist at least two interoperable
> implementations of the mapping language providing at least ETL. Aside
> from this, implementers are encouraged to support on-demand mapping.
>
> AKE> No, we need to support both approaches as 1^st class citizens. I
> view on-demand as more critical as having minimal cost for translation
> is more critical than the ETL approach. Second, with either ETL or
> on-demand we need to have a proof of concept or recommendation for how
> to reconcile RDF sub-graphs out of multiple data sources into a single
> domain ontology.
> Regards,
>
> Ahmed
>
>
> Ahmed K. Ezzat, Ph.D.
> HP Fellow, Business Intelligence Software Division*
> Hewlett-Packard Corporation *
> 11000 Wolf Road, Bldg 42 Upper, MS 4502, Cupertino, CA 95014-0691 *
> Office*: *Email*: _Ahmed.Ezzat@hp.com_ <mailto:Ahmed.Ezzat@hp.com>
> *Tel*: 408-447-6380 *Fax*: 1408796-5427 *Cell*: 408-504-2603*
> Personal*: *Email*: _AhmedEzzat@aol.com_ <mailto:AhmedEzzat@aol.com>
> *Tel*: 408-253-5062 *Fax*: 408-253-6271
>
>
>
>
Received on Tuesday, 6 January 2009 15:15:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 6 January 2009 15:15:47 GMT