- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 11 Apr 2003 22:31:26 -0400
- To: www-tag@w3.org
- Cc: cmsmcq@w3.org, jmarsh@microsoft.com
As you may or may not be aware, the Schema recommendation draws an important architectural distinction between what it calls schema components, and so-called schema documents. In private communication, Tim Bray suggested that the TAG might welcome a bit of explanation of this distinction, so here's an attempt. I'm not advocating anything here, merely explaining how the recommendation works, in the hopes of informing your deliberations on "ComponentRefs" issue. Also, I'm writing for myself, not for the Schema WG. Schema Components and Schema Documents -------------------------------------- Schema is fundamentally defined at what's called the "component" level[1]. Components are abstractions, much like Infoset information items. They tell you the information you need to know, not the form in which it's represented. So, for example, in order to have a declaration for a derived simple type you need to know the name of the new type, the base type from which it's derived, which facets are changed (e.g. maxInclusive), whether the new type is "final", and so on. The core of the schema recommendation doesn't require you to put that information in any particular form. For example, it might live in memory behind some "createSimpleType" API, perhaps having been dynamically created by some database. In any case, the schema recommendation describes the result of a validation, regardless of the form in which the type information is stored. What most people think of as a schema is what the recommendation calls a "schema document"[2]. A schema document sets out a normative XML representation for schema information. Note, however, that a component may draw on information from several documents. For example, you might create a type in one namespace, and I might derive a type in another namespace using yours as a base. Since the schema recommendation uses one document per namespace, the derived type and the base are necessarily set out in different schema documents. Note that the derived component includes information from the base...once derived it stands on its own, and has copies of some information from the base. Thus you cannot in general put together a schema component by reading a single schema document. Furthermore, it is quite possible for a component declared in a schema document to derive from or be the base for or otherwise use another that is defined through some non-document means. For example, an HTML editor could build into a special validator the definitions for the HTML namespace, but could allow schema documents to build content models that use those HTML components. In summary, schema documents are the most common way of setting out a schema, but not the only one, and you tend to need multiple documents to pull together a component. There is in general not a single schema document that can represent a component involving multiple namespaces. As with synthetic infosets, you can create perfectly useable schemas in memory with some API, or in a database, without using the <schema> XML document form. All that's required is that your processor understand the form in which you have stored the various definitions and declarations. Of course, the most commonly available processors read schema documents in the usual XML form, and we call out a level of conformance for those procssors that do[3]. Identifiers for Declarations and Definitions -------------------------------------------- Regarding the TAG issue on references to schema definitions and declarations: it is coherent to consider identifiers for the markup in a schema document, or for the components in a schema, but the abstractions identified are surely very different. The actual "type" that you've derived is not unambiguously determinable from the the document in which the derivation is set out. For example, if someone fixed a bug in the base type, the derived type would change. I think it's fair to say that the schema workgroup has informally concluded that it's schema components for which identities are most urgently needed. On the other hand, for the reasons discussed above, component definitions do not in general follow from individual schema documents; you typically need to assemble (as a validator would) a self-consistent set of schema documents for the namespace(s) being validated in order to know what components you have. Approaches that attempt to use namespaces as the basis for a component name tend not to handle the cases where information is drawn from multiple namespaces, or to deal with the possibility that multiple schema documents are out there describing the same namespace (perhaps due to bug fixes or whatever.) Accordingly, the schema WG is working hard to find the right levels ways of handing component identity. Interestingly, I personally think the answer might well be to use some RDDL or similar document to identify collections of schema documents, and to serve as the basis for creating identifiers for the components (such as type declarations) represented by those documents in combination. If you are interested, the Schema WG has a subgroup wrestling with this...but I am not on it. Check with Michael Sperberg-McQueen, our chair. Is WSDL Similar to Schema? -------------------------- Jonathan Marsh tells me that WSDL has indeed copied our component/document distinction, but my impression is that their workgroup has as yet put less energy into focusing on the subtleties that result. It's also possible that WSDL declarations are somewhat more orthogonal than Schemas, so there may be a more direct mapping between lexical and conceptual forms (I.e. they may be closer to having each component being fully set out in a single document. I'm not familiar with the sorts of importation and derivation across namespaces that they allow.) Anyway, I hope this is helpful to the TAG in its consideration of the abstractComponentRefs issue. Noah [1] http://www.w3.org/TR/xmlschema-1/#key-component [2] http://www.w3.org/TR/xmlschema-1/#key-schemaDoc [3] http://www.w3.org/TR/xmlschema-1/#key-interchange ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------
Received on Friday, 11 April 2003 22:38:32 UTC