Re: Component-Based Schema Design from Roger L. Costello on 2002-12-30 (xmlschema-dev@w3.org from December 2002)

From: Roger L. Costello <costello@mitre.org>
Date: Mon, 30 Dec 2002 09:35:33 -0500
To: "Xmlschema-Dev (E-mail)" <xmlschema-dev@w3.org>
CC: "Costello,Roger L." <costello@mitre.org>
Message-ID: <3E1059B4.E6CCD500@mitre.org>

Hi Mark,

Mark Feblowitz wrote:
> 
> Of course, such an approach would require innovations in parsing
> technologies, since the loading and processing of what could be 
> hundreds of schemas for a reasonably sized xml document would  be 
> prohibitive. There are a few standards out there that essentially have 
> one schema file per chunk, and they are notoriously slow to be 
> validated. Extra machinery such as a schema repository or pre-assembly 
> of the full collection of chunk schemas would be required.

You make an excellent point Mark.  If we were to use the schema chunk
idea - with one schema file per chunk - with today's style of creating
large instance documents ... then validation would be prohibitively slow
and expensive.

However, that assumes that creating large instance documents is a good
thing.  I will argue that it is not.

One design approach is to exchange (between sender and client) a few
documents, each document containing a lot of data.  That is, send large
instance documents.

Advantages:
- may make efficient use of bandwidth

Disadvantages:
- Oftentimes a client doesn't need all the data, just a portion of it.  

An alternative design approach is to exchange a lot of documents, each
document containing a little data.

Advantages:
- The client can be sent just the data he/she desires

Disadvantages:
- may make less efficient use of bandwidth

I will argue that it is typically better to lean towards the later
design approach - exchange small instance documents.  Note that this is
also consistent with the XML Streaming approach.

So, not only do I advocate the creation and use of "schema chunks", I
also advocate small instance documents.

> Another down side of this approach is the management of similar, 
> derived concepts. For concept A' to be derived from concept A, either 
> the schema for A' must be dependent on the schema for A, or the 
> information content from A must be replicated in A', and we all know 
> how difficult it is to maintain definitions that result from 
> replication (especially those who've struggled with derivation by 
> restriction on any reasonable scale). 

Yes, you are absolutely correct.  With the schema chunk approach you may
end up repeating things in multiple chunks.  It boils down to this
tradeoff: independent components versus reusable type hierarchies.  

My experience is that schema type hierarchies make schemas overly
complex and brittle.  I cannot tell you how many schemas I have seen
with type hierarchies 7 levels (or more) deep.  These schemas are
virtually impossible to understand by anyone other than the original
schema designers. 

On the other hand, with independent components they have a specific use,
specific semantics, they are easy to understand, and can be plugged in
to a lot a different uses.

From my perspective, simplicity and "pluggability" are of most
importance.  I am willing to sacrifice the slight benefits of type reuse
to gain the benefits of using rock-solid components.

Thanks for your comments Mark!  /Roger

Received on Monday, 30 December 2002 09:35:52 UTC