Re: Component-Based Schema Design from Paul Kiel on 2003-01-01 (xmlschema-dev@w3.org from January 2003)

From: Paul Kiel <paul@hr-xml.org>
Date: Wed, 1 Jan 2003 11:42:08 -0500
To: <xmlschema-dev@w3.org>
Message-ID: <003b01c2b1b4$b98c3b40$6401a8c0@pkiel2>
Hi Roger,

From my experience, there are realistically three levels of structure.  The
low-level, mostly stand alone stuff that could be equated to your chunks.
The second level is the top level, which brings together the message as a
whole (the "swim-lane" crossing schema piece).  The third level is the
in-between no-man's-land, or as some have called it "aggregates".

What we have found is that there are the least amount of problems with the
low-level stuff.  They tend to be small, discrete, easily defined "chunks".
They may not be totally stand alone, but might only include other low level
items.  These work best because they are discrete pieces of data, easily
communicated.  But this is relatively obvious because in trying to implement
or consume a new standard, size matters (sorry guys!).  The discrete pieces
are the most usable.
So in a sense, I agree with some of your assumptions regarding the low-level
stuff.

> As an aside, I am beginning to believe that there is too much emphasis
> on glue schemas.  The glue elements give the "illusion" of importance,
> when, in fact, they have no importance other that to act as a framework
> to hold the "real" data.

Well - I agree that the glue schemas get the most attention, but that is
necessarily so.  They are where the most problems occur.  They are the most
brittle.  And they aggregate change-risk because of their nature.  So while
I agree with your idea here, I think the practicality of assembling a
meaningful message makes this unavoidable.

Cheers,
Paul
(BTW - this is some nice intellectual stimulation after a week of screaming
kids and family dysfunction - ah the holidays...)



----- Original Message -----
From: "Roger L. Costello" <costello@mitre.org>
To: <xmlschema-dev@w3.org>
Cc: "Costello,Roger L." <costello@mitre.org>
Sent: Friday, December 27, 2002 10:56 AM
Subject: Component-Based Schema Design


>
> Hi Folks,
>
> INTEROPERABILITY VIA "SCHEMA CHUNKS"
>
> I have become convinced that the key to interoperability is to promote
> the use of broadly adopted "schema chunks".  I would like to hear your
> thoughts on how to design interoperable schema chunks.
>
> DEFINITION OF "SCHEMA CHUNK"
>
> First, let me start by defining what I mean by a "schema chunk".  I will
> provide a more detailed definition later, but for now:
>
>    A schema chunk is a schema with a narrow, well-defined purpose.
>
>    Example. A "position" schema chunk has a very narrow scope - it
>    defines the format of position data: lat, lon, msrmt accuracy,
>    and id.
>
> PROPERTIES OF SCHEMA CHUNKS
>
> A schema chunk has certain properties:
>
> (a) a unique identifier
> (b) no dependencies (that is, the schema chunk is standalone)
>
> Thus, a schema chunk represents a reusable component.
>
> PARTIAL VALIDATION AND INTEROPERABILITY
>
> A schema chunk should enable partial validation.  A colleague recently
> helped me to realize the importance of partial validation of an instance
> document, and the role of partial validation in interoperability.
>
> DEFINITION OF PARTIAL VALIDATION
>
> Oftentimes you will receive an instance document and you need only a
> portion of the data.  Thus, you would like to validate just that
> portion, extract it, and process it.
>
> Here are a couple of examples where partial validation plays an
> important role:
>
> EXAMPLE - EXTRACT/PROCESS THE TARGET POSITION CHUNK
>
> A pilot is handed a floppy containing a document that contains, among
> other things, the position of a target to be bombed.  He inserts
> the floppy into his on-board computer, which has a cached copy of the
> position schema chunk.  The computer validates the position data,
> extracts it, and loads the coordinates into the ordinance.  The other
> information on the floppy is irrelevant, and couldn't be validated even
> if desired since the pilot has no connection to a network.
>
> EXAMPLE - PIPELINE PROCESSING OF DATA CHUNKS
>
> Imagine a document that gets sent through a series of stages.  Each
> stage acts like a filter, validating the data chunk that is pertinent to
> that stage, extracting (removing) it, processing it, and then passing
> the modified document downstream to the next stage.
>
> HOW TO DO PARTIAL VALIDATION
>
> You may ask: "How do I perform partial validation?"  Answer: In the
> instance document don't specify schemaLocation.  Then, at validation
> time you must supply namespace/schema-URL values.  To do partial
> validation provide just the namespace/schema-URL pair of the component
> that you are interested in validating.
>
> DESIGNING A SCHEMA CHUNK
>
> Before I unveil my ideas on how to design schema chunks, let's consider
> the implications of what I have stated above:
>
> NARROW, WELL-DEFINED PURPOSE
>
> This implies that the schema chunks be small, i.e., contains a small
> number of elements.
>
> UNIQUE IDENTIFIER
>
> A schema chunk is identified by its targetNamespace.  To give each chunk
> a unique identifier implies that each chunk go into a different schema,
> and each schema have a different targetNamespace.  That is, one chunk,
> one schema.
>
> DECOUPLED
>
> Each chunk must be standalone, self-contained, with no dependencies on
> other schemas.  This means no importing/including of other schemas.
>
> PROPOSED SCHEMA CHUNK DESIGN
>
> Below is my proposal on how to design schema chunks to promote
> reusability and interoperability.
>
> First, here's my expanded definition of "schema chunk":
>
> - a globally declared element comprised of 5-10 in-lined child elements.
> - a chunk represents a well-defined chunk of information.
> - a chunk has a unique identifier - the targetNamespace.
> - a chunk is broadly useable.
> - one schema, one chunk.  That is, a schema just defines one chunk.
> - a chunk has no dependencies on other schemas
>    a. Use element declarations, simpleType definitions.
>    b. Don't use derived types, substitutionGroups.
>    c. Use the Russian Doll design.
>
> So, here's my proposal of how a schema chunk should be designed:
>
> <?xml version="1.0"?>
> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>             targetNamespace="schema-chunk-id"
>             elementFormDefault="qualified">
>     <xsd:element name="chunk">
>         <xsd:complexType>
>             <xsd:sequence>
>                 <xsd:element name="child-element1" type="simpleType-1>
>                 <xsd:element name="child-element2" type="simpleType-2>
>                 <xsd:element name="child-element3" type="simpleType-3>
>                 <xsd:element name="child-element4" type="simpleType-4>
>                 <xsd:element name="child-element5" type="simpleType-5>
>                 </xsd:element>
>             </xsd:sequence>
>         </xsd:complexType>
>     </xsd:element>
>
>     <xsd:simpleType name="simpleType-1">...</simpleType>
>     <xsd:simpleType name="simpleType-2">...</simpleType>
>     <xsd:simpleType name="simpleType-3">...</simpleType>
>     <xsd:simpleType name="simpleType-4">...</simpleType>
>     <xsd:simpleType name="simpleType-5">...</simpleType>
>
> </xsd:schema>
>
> Note that with this design:
>
> a. It defines one chunk:
>
>      <chunk>
>          ...
>      </chunk>
>
> b. The chunk has a unique id - defined by the targetNamespace.
>
> c. The data is strongly type - a simpleType for each data item.
>
> d. The chunk is small - just 5 data items.
>
> e. The chunk is bounded - the child elements are in-lined, using the
> Russian Doll design.
>
> f. The chunk is standalone - all simpleTypes needed to define the chunk
> are bundled in the schema.  Everything that is needed to use and
> understand the schema is right there.  No need to look through a long
> type hierarchy chain, no need to examine other schemas.
>
> GLUE SCHEMAS
>
> I have become a believer in schema design using schema chunks.  The
> major emphasis in schema design should, I believe, be on creating and
> reusing schema chunks.  The purpose of the "other" (non-chunk) schemas
> is to simply glue together the schema chunks.
>
> As an aside, I am beginning to believe that there is too much emphasis
> on glue schemas.  The glue elements give the "illusion" of importance,
> when, in fact, they have no importance other that to act as a framework
> to hold the "real" data.
>
> I am starting to believe that the right approach may be to empower
> instance document authors to decide what collection of schema chunks
> they wish to use, and let them glue them together using whatever
> elements they wish.
>
> CONCLUSIONS
>
> To enable interoperability, I think that creating broadly adopted,
> reusable schema chunks is very important.  So, how do we design broadly
> adopted, reusable schema chunks?  In this message I have attempted to
> outline what I see as an approach to designing schema chunks.  It
> fundamentally advocates a Minimalist use of XML Schema functionality -
> no type derivation, no element substitution, no import/include.
>
> I welcome your comments and suggestions.  /Roger
>
>
Received on Thursday, 2 January 2003 09:52:53 UTC