- From: Roger L. Costello <costello@mitre.org>
- Date: Fri, 27 Dec 2002 10:56:52 -0500
- To: xmlschema-dev@w3.org
- CC: "Costello,Roger L." <costello@mitre.org>
Hi Folks, INTEROPERABILITY VIA "SCHEMA CHUNKS" I have become convinced that the key to interoperability is to promote the use of broadly adopted "schema chunks". I would like to hear your thoughts on how to design interoperable schema chunks. DEFINITION OF "SCHEMA CHUNK" First, let me start by defining what I mean by a "schema chunk". I will provide a more detailed definition later, but for now: A schema chunk is a schema with a narrow, well-defined purpose. Example. A "position" schema chunk has a very narrow scope - it defines the format of position data: lat, lon, msrmt accuracy, and id. PROPERTIES OF SCHEMA CHUNKS A schema chunk has certain properties: (a) a unique identifier (b) no dependencies (that is, the schema chunk is standalone) Thus, a schema chunk represents a reusable component. PARTIAL VALIDATION AND INTEROPERABILITY A schema chunk should enable partial validation. A colleague recently helped me to realize the importance of partial validation of an instance document, and the role of partial validation in interoperability. DEFINITION OF PARTIAL VALIDATION Oftentimes you will receive an instance document and you need only a portion of the data. Thus, you would like to validate just that portion, extract it, and process it. Here are a couple of examples where partial validation plays an important role: EXAMPLE - EXTRACT/PROCESS THE TARGET POSITION CHUNK A pilot is handed a floppy containing a document that contains, among other things, the position of a target to be bombed. He inserts the floppy into his on-board computer, which has a cached copy of the position schema chunk. The computer validates the position data, extracts it, and loads the coordinates into the ordinance. The other information on the floppy is irrelevant, and couldn't be validated even if desired since the pilot has no connection to a network. EXAMPLE - PIPELINE PROCESSING OF DATA CHUNKS Imagine a document that gets sent through a series of stages. Each stage acts like a filter, validating the data chunk that is pertinent to that stage, extracting (removing) it, processing it, and then passing the modified document downstream to the next stage. HOW TO DO PARTIAL VALIDATION You may ask: "How do I perform partial validation?" Answer: In the instance document don't specify schemaLocation. Then, at validation time you must supply namespace/schema-URL values. To do partial validation provide just the namespace/schema-URL pair of the component that you are interested in validating. DESIGNING A SCHEMA CHUNK Before I unveil my ideas on how to design schema chunks, let's consider the implications of what I have stated above: NARROW, WELL-DEFINED PURPOSE This implies that the schema chunks be small, i.e., contains a small number of elements. UNIQUE IDENTIFIER A schema chunk is identified by its targetNamespace. To give each chunk a unique identifier implies that each chunk go into a different schema, and each schema have a different targetNamespace. That is, one chunk, one schema. DECOUPLED Each chunk must be standalone, self-contained, with no dependencies on other schemas. This means no importing/including of other schemas. PROPOSED SCHEMA CHUNK DESIGN Below is my proposal on how to design schema chunks to promote reusability and interoperability. First, here's my expanded definition of "schema chunk": - a globally declared element comprised of 5-10 in-lined child elements. - a chunk represents a well-defined chunk of information. - a chunk has a unique identifier - the targetNamespace. - a chunk is broadly useable. - one schema, one chunk. That is, a schema just defines one chunk. - a chunk has no dependencies on other schemas a. Use element declarations, simpleType definitions. b. Don't use derived types, substitutionGroups. c. Use the Russian Doll design. So, here's my proposal of how a schema chunk should be designed: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="schema-chunk-id" elementFormDefault="qualified"> <xsd:element name="chunk"> <xsd:complexType> <xsd:sequence> <xsd:element name="child-element1" type="simpleType-1> <xsd:element name="child-element2" type="simpleType-2> <xsd:element name="child-element3" type="simpleType-3> <xsd:element name="child-element4" type="simpleType-4> <xsd:element name="child-element5" type="simpleType-5> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:simpleType name="simpleType-1">...</simpleType> <xsd:simpleType name="simpleType-2">...</simpleType> <xsd:simpleType name="simpleType-3">...</simpleType> <xsd:simpleType name="simpleType-4">...</simpleType> <xsd:simpleType name="simpleType-5">...</simpleType> </xsd:schema> Note that with this design: a. It defines one chunk: <chunk> ... </chunk> b. The chunk has a unique id - defined by the targetNamespace. c. The data is strongly type - a simpleType for each data item. d. The chunk is small - just 5 data items. e. The chunk is bounded - the child elements are in-lined, using the Russian Doll design. f. The chunk is standalone - all simpleTypes needed to define the chunk are bundled in the schema. Everything that is needed to use and understand the schema is right there. No need to look through a long type hierarchy chain, no need to examine other schemas. GLUE SCHEMAS I have become a believer in schema design using schema chunks. The major emphasis in schema design should, I believe, be on creating and reusing schema chunks. The purpose of the "other" (non-chunk) schemas is to simply glue together the schema chunks. As an aside, I am beginning to believe that there is too much emphasis on glue schemas. The glue elements give the "illusion" of importance, when, in fact, they have no importance other that to act as a framework to hold the "real" data. I am starting to believe that the right approach may be to empower instance document authors to decide what collection of schema chunks they wish to use, and let them glue them together using whatever elements they wish. CONCLUSIONS To enable interoperability, I think that creating broadly adopted, reusable schema chunks is very important. So, how do we design broadly adopted, reusable schema chunks? In this message I have attempted to outline what I see as an approach to designing schema chunks. It fundamentally advocates a Minimalist use of XML Schema functionality - no type derivation, no element substitution, no import/include. I welcome your comments and suggestions. /Roger
Received on Friday, 27 December 2002 10:57:00 UTC