Component-Based Schema Design from Roger L. Costello on 2002-12-27 (xmlschema-dev@w3.org from December 2002)

From: Roger L. Costello <costello@mitre.org>
Date: Fri, 27 Dec 2002 10:56:52 -0500
To: xmlschema-dev@w3.org
CC: "Costello,Roger L." <costello@mitre.org>
Message-ID: <3E0C7844.CC6ADA1B@mitre.org>
Hi Folks,

INTEROPERABILITY VIA "SCHEMA CHUNKS"

I have become convinced that the key to interoperability is to promote
the use of broadly adopted "schema chunks".  I would like to hear your
thoughts on how to design interoperable schema chunks.

DEFINITION OF "SCHEMA CHUNK"

First, let me start by defining what I mean by a "schema chunk".  I will
provide a more detailed definition later, but for now: 

   A schema chunk is a schema with a narrow, well-defined purpose.

   Example. A "position" schema chunk has a very narrow scope - it
   defines the format of position data: lat, lon, msrmt accuracy,
   and id.

PROPERTIES OF SCHEMA CHUNKS

A schema chunk has certain properties: 

(a) a unique identifier
(b) no dependencies (that is, the schema chunk is standalone)

Thus, a schema chunk represents a reusable component.

PARTIAL VALIDATION AND INTEROPERABILITY

A schema chunk should enable partial validation.  A colleague recently
helped me to realize the importance of partial validation of an instance
document, and the role of partial validation in interoperability. 

DEFINITION OF PARTIAL VALIDATION

Oftentimes you will receive an instance document and you need only a
portion of the data.  Thus, you would like to validate just that
portion, extract it, and process it.  

Here are a couple of examples where partial validation plays an
important role:

EXAMPLE - EXTRACT/PROCESS THE TARGET POSITION CHUNK

A pilot is handed a floppy containing a document that contains, among
other things, the position of a target to be bombed.  He inserts
the floppy into his on-board computer, which has a cached copy of the
position schema chunk.  The computer validates the position data,
extracts it, and loads the coordinates into the ordinance.  The other
information on the floppy is irrelevant, and couldn't be validated even
if desired since the pilot has no connection to a network.

EXAMPLE - PIPELINE PROCESSING OF DATA CHUNKS

Imagine a document that gets sent through a series of stages.  Each
stage acts like a filter, validating the data chunk that is pertinent to
that stage, extracting (removing) it, processing it, and then passing
the modified document downstream to the next stage.

HOW TO DO PARTIAL VALIDATION

You may ask: "How do I perform partial validation?"  Answer: In the
instance document don't specify schemaLocation.  Then, at validation
time you must supply namespace/schema-URL values.  To do partial
validation provide just the namespace/schema-URL pair of the component
that you are interested in validating.

DESIGNING A SCHEMA CHUNK

Before I unveil my ideas on how to design schema chunks, let's consider
the implications of what I have stated above:

NARROW, WELL-DEFINED PURPOSE

This implies that the schema chunks be small, i.e., contains a small
number of elements.

UNIQUE IDENTIFIER

A schema chunk is identified by its targetNamespace.  To give each chunk
a unique identifier implies that each chunk go into a different schema,
and each schema have a different targetNamespace.  That is, one chunk,
one schema.

DECOUPLED

Each chunk must be standalone, self-contained, with no dependencies on
other schemas.  This means no importing/including of other schemas.

PROPOSED SCHEMA CHUNK DESIGN

Below is my proposal on how to design schema chunks to promote
reusability and interoperability.

First, here's my expanded definition of "schema chunk":

- a globally declared element comprised of 5-10 in-lined child elements.
- a chunk represents a well-defined chunk of information. 
- a chunk has a unique identifier - the targetNamespace.  
- a chunk is broadly useable.  
- one schema, one chunk.  That is, a schema just defines one chunk.
- a chunk has no dependencies on other schemas 
   a. Use element declarations, simpleType definitions.
   b. Don't use derived types, substitutionGroups.
   c. Use the Russian Doll design.

So, here's my proposal of how a schema chunk should be designed:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            targetNamespace="schema-chunk-id"
            elementFormDefault="qualified">
    <xsd:element name="chunk">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="child-element1" type="simpleType-1>
                <xsd:element name="child-element2" type="simpleType-2>
                <xsd:element name="child-element3" type="simpleType-3>
                <xsd:element name="child-element4" type="simpleType-4>
                <xsd:element name="child-element5" type="simpleType-5>
                </xsd:element>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>

    <xsd:simpleType name="simpleType-1">...</simpleType>
    <xsd:simpleType name="simpleType-2">...</simpleType>
    <xsd:simpleType name="simpleType-3">...</simpleType>
    <xsd:simpleType name="simpleType-4">...</simpleType>
    <xsd:simpleType name="simpleType-5">...</simpleType>

</xsd:schema>

Note that with this design:

a. It defines one chunk:

     <chunk>
         ...
     </chunk>

b. The chunk has a unique id - defined by the targetNamespace.

c. The data is strongly type - a simpleType for each data item.

d. The chunk is small - just 5 data items.

e. The chunk is bounded - the child elements are in-lined, using the
Russian Doll design.

f. The chunk is standalone - all simpleTypes needed to define the chunk
are bundled in the schema.  Everything that is needed to use and
understand the schema is right there.  No need to look through a long
type hierarchy chain, no need to examine other schemas.

GLUE SCHEMAS

I have become a believer in schema design using schema chunks.  The
major emphasis in schema design should, I believe, be on creating and
reusing schema chunks.  The purpose of the "other" (non-chunk) schemas
is to simply glue together the schema chunks.

As an aside, I am beginning to believe that there is too much emphasis
on glue schemas.  The glue elements give the "illusion" of importance,
when, in fact, they have no importance other that to act as a framework
to hold the "real" data.

I am starting to believe that the right approach may be to empower
instance document authors to decide what collection of schema chunks
they wish to use, and let them glue them together using whatever
elements they wish.

CONCLUSIONS

To enable interoperability, I think that creating broadly adopted,
reusable schema chunks is very important.  So, how do we design broadly
adopted, reusable schema chunks?  In this message I have attempted to
outline what I see as an approach to designing schema chunks.  It
fundamentally advocates a Minimalist use of XML Schema functionality -
no type derivation, no element substitution, no import/include.

I welcome your comments and suggestions.  /Roger
Received on Friday, 27 December 2002 10:57:00 UTC