[ACTION-331] Mappings from existing data models vs defining a new data modelling language

All,

In completion of my action-331, Ii will try to clarify the the issue on 
mappings from existing data models vs defining a new data modelling 
language.

I will use the term "data schema" (as in XML schema, relational data 
base schema etc), in an attempt to avoid the confusion between the RIF 
data model and application specific instances of data schemas.

I assume that the data to which the interchanged rules apply are not 
included in the RIF document, the general cases being that either each 
party uses the interchanged rules with its own data, a common data 
source is shared by different means (e.g. Web service acces), or a data 
document is interchanged separately (the reason for interchanging the 
rules and data separately, in the latter case, being to separate concerns).

The problem is thus to make sure that everybody apply the rules to the 
data in a consistent way, that is, that, given the same dataset, every 
consumer of a RIF document will give the same interpretation to the 
rules and to their parts (terms, litterals etc).

The solution is that the parties in an interchange must agree on a 
common data schema and on how to interpret it (in the sense that a data 
schema defines an application's  vocabulary -terms, relations etc- and 
the datasets provide the interpretations).

Based on that agreement, each party knows how to map the data schema 
onto their own data structure (or onto the shared data structure), and 
is able to apply the rules to the data in a consistent way, provided a 
fixed mapping between the schema language used to specify the agreed on 
data schema and the RIF data model.

The question is now: should RIF define a data schema language, so that 
parties in an interchange can use RIF to specify the common data schema 
they agree on?

The benefit would be to avoid one step in the translation from/to RIF 
to/from one's own rule language, as the data model would map onto each 
other without an intermediary.

Without entering a theoretical discussion, nor a discussion about 
principles, scope and, the main drawback of this approach would be 
practical: in many cases where rules need be interchanged, if not most, 
the parties must interchange data for other reasons as well, or use them 
in a consistent way for other reason than consistent interpretation of 
rules, and thus need a shared data schema independently of their use of 
RIF for interching rules.

Many such shared data schemas already exist, and Web languages have been 
designed specifically for the purpose of specifying (e.g. XML schema). 
In these cases, the user will be reluctant to redefine the shared schema 
in a different schema language, so that RIF will have to provide a 
mapping from its data model onto the main Schema languages anyway.

In the case where no shared schema exist, the parties will have to 
specify one and, given that RIF will provide a mapping to the main 
schema languages, they will have little motivation to choose to specify 
that schema directly in RIF.

As a conclusion, I think that specifying a schema language within RIF is 
adding an unnecessary burden to our already heavy workload, and that we 
should focus instead on specifying how the RIF data model maps onto 
existing and widely deployed data schema languages.

See you in Hawthorne.

Christian

Received on Wednesday, 26 September 2007 16:07:25 UTC