- From: Markus Krötzsch <mak@aifb.uni-karlsruhe.de>
- Date: Tue, 6 Dec 2005 20:58:38 +0100
- To: public-rif-wg@w3.org
- Message-Id: <200512062058.49645.mak@aifb.uni-karlsruhe.de>
** FZI-2 Enterprise Information Integration A use case in cooperation with Software AG and ontoprise GmbH The use case description is already quite detailed (see below). Here is a short abstract: For the integration of data that resides in autonomous data sources Software AG uses ontologies. Data source ontologies describe the data sources themselves. Business ontologies provide an integrated view of the data. FLogic rules are used to describe mappings between data objects in data source or business ontologies. Furthermore, FLogic is used as the query language. The use case shows that FLogic rules are perfectly suited to describe the mappings between objects and their properties. Some of these mapping rules can be generated automatically from the data sources metadata. Some patterns do frequently reoccur in user-defined mapping rules, for instance rules which establish inverse object relations or rules which create new object relations based on the objects’ property values. ---- The following is the elaborate description of the use case ---- == Data Integration using Semantic Technology == (input for W3C RIF-WG, http://www.w3.org/2005/rules/) Dr. Michael Gesmann, Software AG, Germany Prof. Dr. Jürgen Angele, ontoprise GmbH, Germany Dr. Pascal Hitzler, FZI Karlsruhe, Germany Markus Krötzsch, FZI Karlsruhe, Germany === Abstract === For the integration of data that resides in autonomous data sources Software AG uses ontologies. Data source ontologies describe the data sources themselves. Business ontologies provide an integrated view of the data. FLogic rules are used to describe mappings between data objects in data source or business ontologies. Furthermore, FLogic is used as the query language. The use case shows that FLogic rules are perfectly suited to describe the mappings between objects and their properties. Some of these mapping rules can be generated automatically from the data sources metadata. Some patterns do frequently reoccur in user-defined mapping rules, for instance rules which establish inverse object relations or rules which create new object relations based on the objects' property values. Within our first project access to information is still typical data retrieval and not so much knowledge inference. Therefore, a lot of effort in this project concentrated on query functionality and even more on performance. === Introduction === Data that is essential for a company's successful businesses often resides in a variety of data sources. The reasons for this are manifold, e.g. load distribution or independent development of business processes. But data distribution can lead to inconsistent data which is a problem in the development of new businesses. Thus the consolidation of the spread data as well as giving applications a shared picture of all existing data is an important challenge. The integration of such distributed data is the task of Software AG's "Enterprise Information Integrator" (EII) [http://www.softwareag.com/corporate/Solutions/integration/Info_integration]. EII is based on ontologies. On one hand, data source ontologies can be generated from metadata of underlying data sources. Currently, SQL databases, Software AG's Adabas databases, and web services are supported types of data sources. On the other hand, more business oriented ontologies can be developed. These business ontologies make use of other business ontologies or can directly use data source ontologies. FLogic rules describe the information how objects in different ontologies are related to each other. Within Software AG EII was used for a first project whose mission was to integrate data that on one side resides in a support information system and on the other side is stored in a customer information system. The support information system maintains for example information about customers, their contact information and active or closed support requests. The customer information system contains information about customers, contracts etc. While one of these systems stores its data in an Adabas database, the other system uses an SQL server. The integrated data view is exposed in a browser based application to various parties inside the company. For the system we have a dozen source ontologies describing some SQL and some Adabas tables. There is only one business ontology which gives users a single view of customer data, their contracts and support requests. In this paper we present some examples on how we used rules within our ontologies and derive some requirements and use cases for rule languages to be used in such a project. === Data source import === The system needs to be open in a sense that it allows for extensions which provide access to external data sources. In EII so-called built-in predicates implement this. It is feasible to have a single rule for every data source, e.g. for every table in a database. However, a system that implements access to external data sources only via such single-table access rules will probably often not achieve sufficient performance. The reason for this is that resulting access operations do not make use of the data source's query capabilities like join-operations. Import from data sources is easy, as long as the order of single result object and order of property values within a row are not significant. This is valid for imports from SQL and mostly also for import from Adabas. For data sources like web services which expect and return XML documents this is no longer true. For chaining of web services, i.e. the result of one or more web services is the input for another web service, it is necessary to maintain the structure of the original result documents. Preserving the structure leads to complex rules. === Object and property mapping === It is very easy to define that an object in one model representing the data source is also an object in another model representing the business model. For example a support request in the support information system is also a support object in the business model. An example in rule terms -- we use F-Logic syntax throughout: FORALL X c(tablename, X):Contract[contractId->X]@Source <- accessToSource(connectionInfo, tablename, rowid, X). FORALL X X:Contract@Business <- X:Contract@Source. If the underlying data from the external sources contains such information, it is also easily possible to describe that two objects are the same. For example a customer in the support information system and a customer in the customer information system represent the same object, if these customers have the same name and address. A customer might have in both systems a surrogate value as a unique key but typically these values are not a viable object identifier across independent data sources. An example in rule terms: FORALL X c(tablename1, X):Contract1[contractId1->X]@Source1 <- accessToSource(connectionInfo1, tablename1, rowid1, X). FORALL X c(tablename2, X):Contract2[contractId2->X]@Source2 <- accessToSource(connectionInfo2, tablename2, rowid2, X). FORALL X, Y c(Contract, Y):Contract@Business <- X:Contract1[contractId1->Y]@Source1. FORALL X, Y c(Contract, Y):Contract@Business <- X:Contract2[contractId2->Y]@Source2. Like for objects it is also easily possible to specify that a property in on ontology maps to a property in another ontology and that all property values in a first ontology are also values of the property in the second ontology. An example in rule terms: FORALL X, Y c(tablename, X):Contract[contractId->X; contractDate->Y]@Source <- accessToSource(connectionInfo, tablename, rowid, X, datefield Y). FORALL X X:Contract[date->Y]@Business <- X:Contract[contractDate->Y]@Source. These simple types of mapping are essential for specification of business ontologies on top of data source or other business ontologies. === Property value mapping === Often similar data that is represented in one way in a first database can be represented in a different way in another database. For example: * data that is encoded in a single column or field might be scattered across multiple attributes in another database (comma-separated name versus firstname and lastname, encoding of some numeric or boolean bits into a single bit array) * data with different representation (time and date values as a number, as XML types, as SQL values) In all these cases it is very helpful to have an extensibility which allows for adding functions that implement necessary transformations. An example in rule terms: FORALL A, X A[Contract_End_Date_Formatted->X]@Business <- EXISTS B (A: Contract[Contract_End_Date->B]@Business and natdate2string(B,X)). where natdate2string() is a predicate that transforms a date from one presentation into another one. === Object references and more metadata === Every functional model needs to describe relations between objects. Object properties are used to express these relationships in a model. Object identifiers are object property values which reference the object with the identifier. These properties and property values are similar to foreign keys in relational databases. The foreign key information that is provided with the schema description should be used during generation of the data source model. An example in rule terms: FORALL X c(CUSTOMER, X):CUSTOMER[cuid->X]@Source <- accessToSource(connectionInfo, CUSTOMER, cid, X). FORALL X,Y c(CONTRACT, X):CONTRACT[coid->X;customer->Y]@Source <- accessToSource(connectionInfo, CONTRACT, cid, X, customer Y). FORALL X,Y X[forCustomer->c(CUSTOMER, Y)] <- X:CONTRACT[customer->Y]@Source. Also, the inverse reference could be generated. But because there is no inverse name in schemas (while there is a name for the foreign key constraint in SQL databases) this is currently postponed to application development. ... some rules for Customer and Contract in business ontology ... FORALL X,Y X[hasContracts->>Y]@Business <- Y:Contract[forCustomer->X]@Business. Even N:M relationships in relational systems which are to be implemented by two 1:N foreign key relations can be expressed directly. But rules are a little bit more complex. === Queries === The learning of new languages is always a substantial investment, in particular if this involves the learning of new programming paradigms. Having different languages for the modelling and for the querying of ontologies bears the potential for impedance mismatches and causes additional costs. Therefore, rule language and query language should at best be the same. However, like queries in database applications, the queries in our project shall provide some result information. It was not the goal to find all explanations, why the returned results are valid results, nor was it a goal to get all variable bindings that lead to a result. Using the language as a query language leads to some typical database query language requirements, for example: * User-defined projections on the query result should be possible. Object relationships should be contained in the result. E.g. for one customer having multiple contracts each having contract items, then the query result should contain the information which contract item belongs to which contract within a single result per customer. * Aggregations over data should be possible (although not yet used within the project). === Performance === Because the integrated view is used within an application where users expect immediate or at least fast answers for even complex requests, the performance of the rule and query processing is a very important requirement. If responsiveness of the system is not sufficient (e.g. response in less than 2 seconds), the whole functionality will not be accepted by its users. This means, systems like the described one can only accept rule languages that allow for efficient processing. Not surprisingly, experience with the system has shown that efficient processing also to a great extent depends on an optimized rule execution order and caching of intermediate results. Problems that showed up here are very similar to many query optimization problems in database systems. For integration of data sources it is important to consider which data is stored in which system. It is not sufficient to treat each table as an independent data source. For performance reasons it is indispensable to make use of existing indexes, uniqueness of values, or of join capabilities etc. The current implementation of EII answers queries all at once. Like in other data intensive applications, it would sometimes be more convenient to have a streaming or cursor result which delivers first results quick and further results on demand. === Summary === To summarize the previously described observations: An EII data model consists of ontologies. Within our first project, the access to information in these data models is still typical data retrieval and not so much knowledge inference. Therefore, many requirements expressed here are typical requirements for querying in data intensive applications (cursor, performance, query functionality). Ontologies and rules can describe data that is located in autonomous data sources. Furthermore, rules can explain relationships between data. Rules are the first choice to express semantics that is not immediately available within the data. Rules within the ontologies allow to express semantics that otherwise had to be evaluated in queries or within an application. With an increasing number of web services where quite some of them simply expose data, we also expect the need to support data integration for such web services. Because web services expect and expose structured data, a rule language should directly support this. -- Markus Krötzsch Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe mak@aifb.uni-karlsruhe.de phone +49 (0)721 608 7362 www.aifb.uni-karlsruhe.de/WBS/ fax +49 (0)721 693 717
Received on Tuesday, 6 December 2005 20:00:25 UTC