Identifying Data for Interchange [was: Component-based Schema Design]

Hi Folks,

Premise: Data needs to be sent from a sender to a recipient.  That is,
data needs to be interchanged.

Goal: identify the defining characteristics of data that is "suitable"
for being interchanged.  Conversely, identify the defining
characteristics of data that is "not suitable" for being interchanged.

Below I introduce my characterization of the qualities of good
"interchange data". My characterization is far from complete.  I invite
your input.

Let's jump right in with an example to demonstrate how certain data is
not suitable for interchange. Then I will show an example of data that
is suitable for interchange. I will then attempt to distill a set of
characteristics of good interchange data.

Example. Let's suppose that you are onboard an aircraft which has a
system that periodically transmits to a ground station some
information.  Included in this information is:

    - Distance to Destination Airport
    - Distance to Navigation Aid
    - Distance to Emergency Airport

What data should be sent to the ground station? Your first instinct
might be to send "Distance-to-X" data.  For example:

    <distance to="destination airport">590</distance>
    <distance to="navigation aid">140</distance>
    <distance to="emergency airport">75</distance>

However, distance data is a poor choice of interchange data.  Let's see
why.

Using "distance" discards much of the information that recipients of the
data could potentially want. For example, it doesn't allow recipients to
compute things like heading, location relative to another aircraft, etc

Thus, distance is poor interchange data.  (Note: distance could be good
as "internal data".  But when going "outside" then it is a poor choice
of interchange data.)

Good interchange data is position.  If the position was sent to the
ground station then the distance could be computed.  Additionally, the
position data could be used for other things, such as determining the
heading of the aircraft, or determining how close the aircraft is to
another aircraft.

Thus, position is good interchange data.

Let's now look at the characteristics of good interchange data.

INTERCHANGE DATA SHOULD BE "HIGH VALUE DATA"

High value data is data that applications can add value to.

Examples of adding value to data:
   - performing calculations on the data to generate other data
   - using the data in other contexts

We saw in the above example that position data enables recipient
applications to calculate the heading of the aircraft, calculate the
distance the aircraft is to another aircraft, etc.  That is, the
position data may be used to generate other (useful) data. Additionally,
the position data could be plugged into other applications, for example,
a map application.  Position data is high value data. 

...

Hopefully I have convinced you that position is a good interchange
object.  I have not yet indicated how position should be represented. 
Let's look at that.

How we represent position depends on what coordinate reference system we
use.  There are many different coordinate reference systems.
Should we allow position to be interchanged in various different
representations (i.e., using different coordinate reference systems)?

No!  Doing so would make each application very complex as it would
require each application to be "multilingual". That
is, it would require each application to be able to convert from one
coordinate reference system to another.  This is the N^2 conversion
problem. Avoid making each application multilingual!  Remember,
complexity kills!

Instead, define precisely one, unambiguous "interchange standard".  This
becomes the "lingua franca" interchange format.

Example. What should be the "lingua franca" coordinate reference
system?  That, of course, depends on the situation but here's an
approach to deciding what should be the coordinate reference system
interchange standard:  it should be a universal standard, implementable
on devices ranging from $3 micro-processors to $3B weapon systems, and
there must be conversion algorithms from any coordinate system to and
from the interchange standard coordinate reference system.  Answer: the
WGS-84 standard fits those requirements very nicely!  

Thus, the data that is suitable for interchange is a position.  The
position representation is standardized to decimal degrees, WGS84,
linear measures in meters.

...

Characteristics of Good Interchange Data

1. You should never interchange data that may be calculated. 
Interchange using the "fundamental data", from which calculations may be
done.

2. The more different ways data can be used, the higher the value of the
data, the less application-specific it is, and the more suitable it is
for data interchange.

3. Once you have identified good interchange data you then need to
determine how to represent it.  There may be various ways to represent
it.  However, you should pick precisely one, unambiguous representation
for which there are algorithms to map to the other representations. 
This becomes the "interchange standard".  Defining an interchange
standard greatly reduces the complexity of all applications.

...

Observations

a. Reusable components start to emerge as the "fundamental data" is
uncovered.  For example, the position object is a highly reusable
component.  High value data makes for good reusable components.

b. Schemas simplify as you adopt an interchange standard.  For example,
there is no need to define various coordinate reference systems for the
position object.  There is only one coordinate reference system for
interchange.

c. Application simplify as you adopt an interchange standard.
Applications no longer need to be "multilingual".  The N^2 conversion
problem is eliminated.

...

Okay.  That's a start.  I invite you to present your ideas on what
characterizes good interchange data.  I invite your input on how to
better express what I have presented above (e.g., do you agree with my
term "high value data" to describe the data that suitable for being
interchanged?  Can you think of a better/more accurate term?)

Thanks!  /Roger

Received on Saturday, 4 January 2003 12:07:06 UTC