[Use Case] FZI-2 Enterprise Information Integration from Markus Krötzsch on 2005-12-06 (public-rif-wg@w3.org from December 2005)

From: Markus Krötzsch <mak@aifb.uni-karlsruhe.de>
Date: Tue, 6 Dec 2005 20:58:38 +0100
To: public-rif-wg@w3.org
Message-Id: <200512062058.49645.mak@aifb.uni-karlsruhe.de>
** FZI-2 Enterprise Information Integration

A use case in cooperation with Software AG and ontoprise GmbH


The use case description is already quite detailed (see below). Here is 
a short abstract:

For the integration of data that resides in autonomous data sources 
Software AG uses ontologies. Data source ontologies describe the data 
sources themselves. Business ontologies provide an integrated view of 
the data. FLogic rules are used to describe mappings between data 
objects in data source or business ontologies. Furthermore, FLogic is 
used as the query language.

The use case shows that FLogic rules are perfectly suited to describe 
the mappings between objects and their properties. Some of these mapping 
rules can be generated automatically from the data sources metadata. 
Some patterns do frequently reoccur in user-defined mapping rules, for 
instance rules which establish inverse object relations or rules which 
create new object relations based on the objects’ property values.



---- The following is the elaborate description of the use case ----


== Data Integration using Semantic Technology ==

(input for W3C RIF-WG, http://www.w3.org/2005/rules/)

Dr. Michael Gesmann, Software AG, Germany
Prof. Dr. Jürgen Angele, ontoprise GmbH, Germany
Dr. Pascal Hitzler, FZI Karlsruhe, Germany
Markus Krötzsch, FZI Karlsruhe, Germany

=== Abstract ===
 
For the integration of data that resides in autonomous data sources Software 
AG uses ontologies. Data source ontologies describe the data sources 
themselves. Business ontologies provide an integrated view of the data. 
FLogic rules are used to describe mappings between data objects in data 
source or business ontologies. Furthermore, FLogic is used as the query 
language.

The use case shows that FLogic rules are perfectly suited to describe the 
mappings between objects and their properties. Some of these mapping rules 
can be generated automatically from the data sources metadata. Some patterns 
do frequently reoccur in user-defined mapping rules, for instance rules which 
establish inverse object relations or rules which create new object relations 
based on the objects' property values. Within our first project access to 
information is still typical data retrieval and not so much knowledge 
inference. Therefore, a lot of effort in this project concentrated on query 
functionality and even more on performance. 

=== Introduction ===

Data that is essential for a company's successful businesses often resides in 
a variety of data sources. The reasons for this are manifold, e.g. load 
distribution or independent development of business processes. But data 
distribution can lead to inconsistent data which is a problem in the 
development of new businesses. Thus the consolidation of the spread data as 
well as giving applications a shared picture of all existing data is an 
important challenge. The integration of such distributed data is the task of 
Software AG's "Enterprise Information Integrator" (EII) 
[http://www.softwareag.com/corporate/Solutions/integration/Info_integration].
 
EII is based on ontologies. On one hand, data source ontologies can be 
generated from metadata of underlying data sources. Currently, SQL databases, 
Software AG's Adabas databases, and web services are supported types of data 
sources. On the other hand, more business oriented ontologies can be 
developed. These business ontologies make use of other business ontologies or 
can directly use data source ontologies. FLogic rules describe the 
information how objects in different ontologies are related to each other.

Within Software AG EII was used for a first project whose mission was to 
integrate data that on one side resides in a support information system and 
on the other side is stored in a customer information system. The support 
information system maintains for example information about customers, their 
contact information and active or closed support requests. The customer 
information system contains information about customers, contracts etc. While 
one of these systems stores its data in an Adabas database, the other system 
uses an SQL server. The integrated data view is exposed in a browser based 
application to various parties inside the company.
 
For the system we have a dozen source ontologies describing some SQL and some 
Adabas tables. There is only one business ontology which gives users a single 
view of customer data, their contracts and support requests. In this paper we 
present some examples on how we used rules within our ontologies and derive 
some requirements and use cases for rule languages to be used in such a 
project.

=== Data source import ===

The system needs to be open in a sense that it allows for extensions which 
provide access to external data sources. In EII so-called built-in predicates 
implement this.

It is feasible to have a single rule for every data source, e.g. for every 
table in a database. However, a system that implements access to external 
data sources only via such single-table access rules will probably often not 
achieve sufficient performance. The reason for this is that resulting access 
operations do not make use of the data source's query capabilities like 
join-operations.

Import from data sources is easy, as long as the order of single result object 
and order of property values within a row are not significant. This is valid 
for imports from SQL and mostly also for import from Adabas. For data sources 
like web services which expect and return XML documents this is no longer 
true. For chaining of web services, i.e. the result of one or more web 
services is the input for another web service, it is necessary to maintain 
the structure of the original result documents. Preserving the structure 
leads to complex rules.

=== Object and property mapping ===

It is very easy to define that an object in one model representing the data 
source is also an object in another model representing the business model. 
For example a support request in the support information system is also a 
support object in the business model.

An example in rule terms -- we use F-Logic syntax throughout:

FORALL X c(tablename, X):Contract[contractId->X]@Source <- 
          accessToSource(connectionInfo, tablename, rowid, X).
FORALL X X:Contract@Business <- X:Contract@Source.

If the underlying data from the external sources contains such information, it 
is also easily possible to describe that two objects are the same. For 
example a customer in the support information system and a customer in the 
customer information system represent the same object, if these customers 
have the same name and address. A customer might have in both systems a 
surrogate value as a unique key but typically these values are not a viable 
object identifier across independent data sources. 

An example in rule terms:

FORALL X c(tablename1, X):Contract1[contractId1->X]@Source1 <- 
       accessToSource(connectionInfo1, tablename1, rowid1, X). FORALL X 
c(tablename2, X):Contract2[contractId2->X]@Source2 <- 
       accessToSource(connectionInfo2, tablename2, rowid2, X). FORALL X, Y 
c(Contract, Y):Contract@Business <- 
       X:Contract1[contractId1->Y]@Source1.
FORALL X, Y c(Contract, Y):Contract@Business <- 
       X:Contract2[contractId2->Y]@Source2.
   
Like for objects it is also easily possible to specify that a property in on 
ontology maps to a property in another ontology and that all property values 
in a first ontology are also values of the property in the second ontology.

An example in rule terms:

FORALL X, Y
   c(tablename, X):Contract[contractId->X;
                            contractDate->Y]@Source 
   <- 
   accessToSource(connectionInfo, tablename, 
                  rowid, X, datefield Y).
FORALL X 
   X:Contract[date->Y]@Business 
   <- 
   X:Contract[contractDate->Y]@Source.

These simple types of mapping are essential for specification of business 
ontologies on top of data source or other business ontologies.

=== Property value mapping ===

Often similar data that is represented in one way in a first database can be 
represented in a different way in another database. For example:

* data that is encoded in a single column or field might be scattered across 
multiple attributes in another database (comma-separated name versus 
firstname and lastname, encoding of some numeric or boolean bits into a 
single bit array)
* data with different representation (time and date values as a number, as XML 
types, as SQL values)

In all these cases it is very helpful to have an extensibility which allows 
for adding functions that implement necessary transformations.

An example in rule terms:
   
   FORALL A, X 
   A[Contract_End_Date_Formatted->X]@Business
   <-
   EXISTS B (A: Contract[Contract_End_Date->B]@Business
   and natdate2string(B,X)).

where natdate2string() is a predicate that transforms a date from one 
presentation into another one.

=== Object references and more metadata ===

Every functional model needs to describe relations between objects. Object 
properties are used to express these relationships in a model. Object 
identifiers are object property values which reference the object with the 
identifier.

These properties and property values are similar to foreign keys in relational 
databases. The foreign key information that is provided with the schema 
description should be used during generation of the data source model. 

An example in rule terms:

FORALL X 
   c(CUSTOMER, X):CUSTOMER[cuid->X]@Source 
   <- 
   accessToSource(connectionInfo, CUSTOMER, cid, X). 
FORALL X,Y
   c(CONTRACT, X):CONTRACT[coid->X;customer->Y]@Source 
   <- 
   accessToSource(connectionInfo, CONTRACT, cid, X, customer Y).
FORALL X,Y 
   X[forCustomer->c(CUSTOMER, Y)]
   <-
   X:CONTRACT[customer->Y]@Source.

Also, the inverse reference could be generated. But because there is no 
inverse name in schemas (while there is a name for the foreign key constraint 
in SQL databases) this is currently postponed to application development.

... some rules for Customer and Contract in business ontology ...
FORALL X,Y
   X[hasContracts->>Y]@Business
   <-
   Y:Contract[forCustomer->X]@Business.
   
Even N:M relationships in relational systems which are to be implemented by 
two 1:N foreign key relations can be expressed directly. But rules are a 
little bit more complex.

=== Queries ===

The learning of new languages is always a substantial investment, in 
particular if this involves the learning of new programming paradigms. Having 
different languages for the modelling and for the querying of ontologies 
bears the potential for impedance mismatches and causes additional costs. 
Therefore, rule language and query language should at best be the same.

However, like queries in database applications, the queries in our project 
shall provide some result information. It was not the goal to find all 
explanations, why the returned results are valid results, nor was it a goal 
to get all variable bindings that lead to a result.

Using the language as a query language leads to some typical database query 
language requirements, for example:

* User-defined projections on the query result should be possible. Object 
relationships should be contained in the result. E.g. for one customer having 
multiple contracts each having contract items, then the query result should 
contain the information which contract item belongs to which contract within 
a single result per customer.
* Aggregations over data should be possible (although not yet used within the 
project).

=== Performance ===

Because the integrated view is used within an application where users expect 
immediate or at least fast answers for even complex requests, the performance 
of the rule and query processing is a very important requirement. If 
responsiveness of the system is not sufficient (e.g. response in less than 2 
seconds), the whole functionality will not be accepted by its users. This 
means, systems like the described one can only accept rule languages that 
allow for efficient processing.

Not surprisingly, experience with the system has shown that efficient 
processing also to a great extent depends on an optimized rule execution 
order and caching of intermediate results. Problems that showed up here are 
very similar to many query optimization problems in database systems.

For integration of data sources it is important to consider which data is 
stored in which system. It is not sufficient to treat each table as an 
independent data source. For performance reasons it is indispensable to make 
use of existing indexes, uniqueness of values, or of join capabilities etc.

The current implementation of EII answers queries all at once. Like in other 
data intensive applications, it would sometimes be more convenient to have a 
streaming or cursor result which delivers first results quick and further 
results on demand.

=== Summary ===

To summarize the previously described observations:

An EII data model consists of ontologies. Within our first project, the access 
to information in these data models is still typical data retrieval and not 
so much knowledge inference. Therefore, many requirements expressed here are 
typical requirements for querying in data intensive applications (cursor, 
performance, query functionality).

Ontologies and rules can describe data that is located in autonomous data 
sources. Furthermore, rules can explain relationships between data. Rules are 
the first choice to express semantics that is not immediately available 
within the data. Rules within the ontologies allow to express semantics that 
otherwise had to be evaluated in queries or within an application.

With an increasing number of web services where quite some of them simply 
expose data, we also expect the need to support data integration for such web 
services. Because web services expect and expose structured data, a rule 
language should directly support this.



-- 
Markus Krötzsch
Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe
mak@aifb.uni-karlsruhe.de        phone +49 (0)721 608 7362
www.aifb.uni-karlsruhe.de/WBS/     fax +49 (0)721 693  717
Received on Tuesday, 6 December 2005 20:00:25 UTC