- From: Mark Feblowitz <mfeblowitz@frictionless.com>
- Date: Fri, 27 Dec 2002 11:23:27 -0500
- To: "'Roger L. Costello'" <costello@mitre.org>
- Cc: "Xmlschema-Dev (E-mail)" <xmlschema-dev@w3.org>
Roger - This is a very compelling idea. It is similar to OAGIS "components" (although more limited in size and scope). I'm certain that there are other similar attempts to appropriately chunk schemas for manageability, usability, etc. Another that comes to mind is the ebXML Core Components work. Again, similar aims, but the chunks are larger in size and scope. There's currently a project to "harmonize" OAGIS Components and ebXML Core Components. I understand the motivation for the chunks, having struggled multiple times with large, inflexible, overly structured schemas. I'm just not sure that the size and independence characteristics will support realistic schemas, except for some subset of extremely simple objects. Of course, such an approach would require innovations in parsing technologies, since the loading and processing of what could be hundreds of schemas for a reasonably sized xml document would be prohibitive. There are a few standards out there that essentially have one schema file per chunk, and they are notoriously slow to be validated. Extra machinery such as a schema repository or pre-assembly of the full collection of chunk schemas would be required. Another down side of this approach is the management of similar, derived concepts. For concept A' to be derived from concept A, either the schema for A' must be dependent on the schema for A, or the information content from A must be replicated in A', and we all know how difficult it is to maintain definitions that result from replication (especially those who've struggled with derivation by restriction on any reasonable scale). This is another area where tool support might help - for example, dependent schemas could easily be used to generate independent schemas. Mark Mark Feblowitz mfeblowitz@frictionless.com MarkFeblowitz@attbi.com w: 617-715-7231 h: 781-721-2729 m: 781-789-5478 -----Original Message----- From: Roger L. Costello [mailto:costello@mitre.org] Sent: Friday, December 27, 2002 10:57 AM To: xmlschema-dev@w3.org Cc: Costello,Roger L. Subject: Component-Based Schema Design Hi Folks, INTEROPERABILITY VIA "SCHEMA CHUNKS" I have become convinced that the key to interoperability is to promote the use of broadly adopted "schema chunks". I would like to hear your thoughts on how to design interoperable schema chunks. DEFINITION OF "SCHEMA CHUNK" First, let me start by defining what I mean by a "schema chunk". I will provide a more detailed definition later, but for now: A schema chunk is a schema with a narrow, well-defined purpose. Example. A "position" schema chunk has a very narrow scope - it defines the format of position data: lat, lon, msrmt accuracy, and id. PROPERTIES OF SCHEMA CHUNKS A schema chunk has certain properties: (a) a unique identifier (b) no dependencies (that is, the schema chunk is standalone) Thus, a schema chunk represents a reusable component. PARTIAL VALIDATION AND INTEROPERABILITY A schema chunk should enable partial validation. A colleague recently helped me to realize the importance of partial validation of an instance document, and the role of partial validation in interoperability. DEFINITION OF PARTIAL VALIDATION Oftentimes you will receive an instance document and you need only a portion of the data. Thus, you would like to validate just that portion, extract it, and process it. Here are a couple of examples where partial validation plays an important role: EXAMPLE - EXTRACT/PROCESS THE TARGET POSITION CHUNK A pilot is handed a floppy containing a document that contains, among other things, the position of a target to be bombed. He inserts the floppy into his on-board computer, which has a cached copy of the position schema chunk. The computer validates the position data, extracts it, and loads the coordinates into the ordinance. The other information on the floppy is irrelevant, and couldn't be validated even if desired since the pilot has no connection to a network. EXAMPLE - PIPELINE PROCESSING OF DATA CHUNKS Imagine a document that gets sent through a series of stages. Each stage acts like a filter, validating the data chunk that is pertinent to that stage, extracting (removing) it, processing it, and then passing the modified document downstream to the next stage. HOW TO DO PARTIAL VALIDATION You may ask: "How do I perform partial validation?" Answer: In the instance document don't specify schemaLocation. Then, at validation time you must supply namespace/schema-URL values. To do partial validation provide just the namespace/schema-URL pair of the component that you are interested in validating. DESIGNING A SCHEMA CHUNK Before I unveil my ideas on how to design schema chunks, let's consider the implications of what I have stated above: NARROW, WELL-DEFINED PURPOSE This implies that the schema chunks be small, i.e., contains a small number of elements. UNIQUE IDENTIFIER A schema chunk is identified by its targetNamespace. To give each chunk a unique identifier implies that each chunk go into a different schema, and each schema have a different targetNamespace. That is, one chunk, one schema. DECOUPLED Each chunk must be standalone, self-contained, with no dependencies on other schemas. This means no importing/including of other schemas. PROPOSED SCHEMA CHUNK DESIGN Below is my proposal on how to design schema chunks to promote reusability and interoperability. First, here's my expanded definition of "schema chunk": - a globally declared element comprised of 5-10 in-lined child elements. - a chunk represents a well-defined chunk of information. - a chunk has a unique identifier - the targetNamespace. - a chunk is broadly useable. - one schema, one chunk. That is, a schema just defines one chunk. - a chunk has no dependencies on other schemas a. Use element declarations, simpleType definitions. b. Don't use derived types, substitutionGroups. c. Use the Russian Doll design. So, here's my proposal of how a schema chunk should be designed: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="schema-chunk-id" elementFormDefault="qualified"> <xsd:element name="chunk"> <xsd:complexType> <xsd:sequence> <xsd:element name="child-element1" type="simpleType-1> <xsd:element name="child-element2" type="simpleType-2> <xsd:element name="child-element3" type="simpleType-3> <xsd:element name="child-element4" type="simpleType-4> <xsd:element name="child-element5" type="simpleType-5> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:simpleType name="simpleType-1">...</simpleType> <xsd:simpleType name="simpleType-2">...</simpleType> <xsd:simpleType name="simpleType-3">...</simpleType> <xsd:simpleType name="simpleType-4">...</simpleType> <xsd:simpleType name="simpleType-5">...</simpleType> </xsd:schema> Note that with this design: a. It defines one chunk: <chunk> ... </chunk> b. The chunk has a unique id - defined by the targetNamespace. c. The data is strongly type - a simpleType for each data item. d. The chunk is small - just 5 data items. e. The chunk is bounded - the child elements are in-lined, using the Russian Doll design. f. The chunk is standalone - all simpleTypes needed to define the chunk are bundled in the schema. Everything that is needed to use and understand the schema is right there. No need to look through a long type hierarchy chain, no need to examine other schemas. GLUE SCHEMAS I have become a believer in schema design using schema chunks. The major emphasis in schema design should, I believe, be on creating and reusing schema chunks. The purpose of the "other" (non-chunk) schemas is to simply glue together the schema chunks. As an aside, I am beginning to believe that there is too much emphasis on glue schemas. The glue elements give the "illusion" of importance, when, in fact, they have no importance other that to act as a framework to hold the "real" data. I am starting to believe that the right approach may be to empower instance document authors to decide what collection of schema chunks they wish to use, and let them glue them together using whatever elements they wish. CONCLUSIONS To enable interoperability, I think that creating broadly adopted, reusable schema chunks is very important. So, how do we design broadly adopted, reusable schema chunks? In this message I have attempted to outline what I see as an approach to designing schema chunks. It fundamentally advocates a Minimalist use of XML Schema functionality - no type derivation, no element substitution, no import/include. I welcome your comments and suggestions. /Roger Confidentiality Notice: This message, including any attachments, is intended solely for the use of the individual to whom it is addressed and may contain information that is privileged and confidential. If you have received this email in error, please delete it. Any disclosure, copying or distribution of this message is strictly prohibited. Thank you.
Received on Friday, 27 December 2002 11:32:43 UTC