Working Groups in the Interaction domain have over the past ten years produced a healthy ecosystem of XML vocabularies to deal with many aspects of the Web's user interface. All of those languages were designed with the goal that they could integrate with one another to rely on each other's power. However, while this has generally been a successful endeavour, problems remain in various situations when the rubber hits the road, a number of which are currently being addressed by the CDF WG.
One area where integration amongst these specifications is still lacking is that of validation or other uses of schema technology applied to compound documents. The intent of this document is to contribute to the discussion on schema best practices for interaction languages, chiefly with two goals:
Our intent here is not to enter a discussion of XML Schema's issues, alleged or real — XML fora and articles in XML publications across the Web can provide ample details about these for whoever wishes to spend some time exploring this topic.
XML Schema is W3C technology, and is fine for its own uses, but this doesn't mean that we should mindlessly apply it across the entire spectrum of W3C technologies without considering alternatives, in very much the same manner that while PNG is a W3C technology usage of JPEG images is still appropriate (and in fact, for a large class of images it would likely be foolish to use PNG).
In fact, I don't think that a discussion of which schema language to use would prove to be very useful. Most WGs have already spent considerable time authoring schemata in their language(s) of choice — which they furthermore likely chose for good reasons — and are unlikely to want to spent yet more time in aligning their existing solution to someone else's one-size-fits-all idea. Rather, in accordance with the two goals stated in the incipit, it is my opinion that it would be more productive to attempt to define two different sets of best practices:
Mark Birbeck, Shane McCarron, and Steven Pemberton have put together a very useful document called the Compound Documents Schema Best Practices (note to Safari users: you will get two login prompts, cancel them both and it'll work). It lists several useful tricks to apply when producing compound schemata based on XML Schema. Mark also presented this document to the CDF WG at its London meeting in August 2005. This document constitutes a valiant and much laudable attempt at making independent schemata work together better, as well as working around some limitations or undesirable side-effect when using XML Schema. I have several comments to make while reading it under the light of the goals of this document.
The first recommendation, "Only have top-level elements if they are really global" is commonly known as the "Russian doll design" since it involves defining elements directly inside other elements. While there are advantages to this approach, I would be hesitant to make it a generic best practice.
In cases where one is designing a highly structured vocabulary meant to encode many independent data structures, since one may have many elements sharing the same name but with different content models, it is indeed a useful approach. However where it concerns languages that are intended to be authored by humans, I find it better to have a one-to-one mapping between an element name and its content model as it is much more natural for authors to think about it that way.
While the generic issue here is common to both RelaxNG and XML Schema, the problem of
authoring tools proposing all the elements in the schema any time that this vocabulary
may occur inside an instance is specific to the latter since RelaxNG has a <start>
that differentiates between a globally available element and one that may be at the
root of a valid fragment.
I think that a document on best practices in schema usage should document the pros and cons of either approaches, but let the reader decide which is best for her own problem-space.
The ability to have a grammar be bound as late as possible to a namespace is useful in two cases that I can think of:
The only situation in which an interaction language could sensibly change its namespace is if it goes through a very radical change that effectively makes it a different language with a different grammar — in which case the schema cannot be reused anyway.
So the only use case that I believe to concern us is the second. I would think however that while it would indeed apply well to such vocabularies as XML Events, XForms, or sXBL, it seems of limited usefulness for XHTML or SVG. That is, unless we make a drastic decision and decree that all W3C compound documents can be used in a single namespace — but such a discussion is outside the bounds of the problem that concerns us here.
This is an XML Schema specific workaround, I think it could be improved by noting
that in RelaxNG the same problem does not occur because datatypeLibrary
and ns
inheritance are separate.
This is a very useful pattern to apply to XML Schema design. I had intended to produce the same using NVDL but unfortunately I've barely had the time to write this document. I'll gladly take an action item to show how the same would work in NVDL as I believe the comparison would be interesting.
As Mark explains in his introduction, the CDSBP document is not an attempt to
find the 'best' solution to the problem
and simply an attempt to solve
the problem using XML Schema, since that was what lay to hand
. I think we
need to take the next step, and attempt to define if not the 'best' at least a
good, workable solution to the problem of compounding schemata, taking into
account the specific needs and constraints of WGs in the Interaction domain
(and later perhaps others).
There are two major downsides in relying on XML Schema alone to perform schema compounding:
Instead, I recommend that where compounding schemata is required, we rely on NVDL. NVDL makes it possible for one to write a schema for one's language without having to pause and consider the implications that it may have when mixed with someone else's. When provided a compound document instance, an NVDL processor will divide it into sections, each of which is to be validated by a separate schema. And those separate schemata can be authored in different schema languages.
The SVG Tiny 1.2 specification already makes use of NVDL to describe its conformance criteria concerning extensibility to other namespaces. It is a very simplistic schema that only states that one should validate elements in the SVG namespace and attributes in a variety of other namespaces, with a given schema, and allows foreign namespaces without the need to validate them (thereby fulfilling XML's extensibility promise, something which has proven overly difficult in most schema languages to date). This could easily be extended to include other schemata describing other languages. The NVDL early access page contains more advanced and interesting examples (which can also be downloaded here).
Relying on NVDL would allow WGs to proceed to schema authoring independently from one another, and would greatly simplify the task of the CDF WG, of validator writers, and of implementers who require schema validation when handling compound documents. I would greatly like to see such practices codified in a centralized document, ideally something along the lines of a CG Note.