Re: XHTML modularization and substitution groups (tag issue XMLVersioning-41, TagSoupIntegration-54, RDFinXHTML-35)

On 13 Feb 2007, at 10:13 , Dan Connolly wrote:

 > Mimasa, Shane,

 > I'm interested in a form of extensibility where a markup language
 > designer can make a new my:box element and say "it's an HTML block
 > element"; then, when a document containing a my:block element is
 > checked for syntactic happiness, the checking tool uses normal HTML
 > schemas until it gets to my:box; then it looks up my:box in the web,
 > finds that it's declared to be an HTML block, and find than an HTML
 > block is allowed here, and carries on happily.

If we assume that my:box is in namespace http://example.com/mine, and
(I have not checked) that HTML has an element (perhaps an abstract
element) named 'box', then one way to do this is with this schema
document, retrievable from the namespace URI (either directly or via
RDDL).

   <schema
       xmlns ="http://www.w3.org/2001/XMLSchema"
       xmlns:my="http://example.com/mine"
       xmlns:html="http://www.w3.org/1999/xhtml"
       targetNamespace="http://example.com/mine" >
     <element name="box" substitutionGroup="html:block"/>
   </schema>

A schema processor processing your enclosing document will see
something like:

   <html  xmlns:my="http://example.com/mine" ...>
    ...
   <div><h3>More details</h3>
   <p>What is really neat about this idea is:</p>
   <my:box>
     IT WORKS
   </my:box>
   <p>And what's more, it was my idea.</p>
   ...
   </html>

and will know that in order to validate 'box' correctly, it's going to
need to find a declaration.  Unless you have instructed it otherwise,
a typical processor will then look for schema components for the
namespace http://example.com/mine.  They might be hard-coded into the
processor -- unlikely in this case.  The user might have told the
processor in advance to load components for that namespace from a
particular URI -- probably more likely, but not what you are
interested in, so we assume that didn't happen.  Or they might be
dereferenceable from the namespace URI.  Since I'm assuming this is
your namespace, and you are keen on making sure things can work using
the follow-your-nose principle, let's assume the schema document
above, or the equivalent, is at the namespace URI or pointed to from a
RDDL document there.  The schema processor reads it, and knows about
my:box.  It knows

   - There's a top-level element in namespace http://example.com/mine
     whose local name is 'box'.
   - That element wants to be allowed to appear wherever html:block
     can appear.
   - Its type is whatever the type of html:box is (you could have
     declared it with a restriction or an extension of that type, but
     in formulating the example you said "It's an HTML box" and nothing
     more; I take you at your word).

Your schema processor can now validate the element.

Without looking at the (X)HTML schema documents, I can't tell you
whether your instance is now valid or not.  It depends on how they are
defined.

Unless the schema author has taken active steps to get in your way,
your document should be valid.  Your schema document, together with
the schema document(s) defining the other namespaces found in your
document, creates a schema in which my:box acts like html:block, in
having the same type and being legal in the same locations.

If the author of the original schema wished to block this kind of
extension, however, there are several ways it could be blocked.  If
the HTML schema you're validating with took the trouble to forbid the
substitution of other elements for html:block, then your instance is
invalid.  If you restrict the type of block, or extend it, and the
schema took the trouble to forbid restriction of the type, or
extension of the type, or to allow the restriction or extension of the
type itself but forbid the substitution of restrictions or extensions
for instances of html:block, then your document is again invalid.

Similarly, if an agent running a validator wishes to block this kind
of extension, in order (for example) to tell whether your instance is
legal against the HTML schema WITHOUT EXTENSIONS, there are some ways
it can be blocked at validation time, too.  In particular, the
validator can be invoked with run-time options specifying "read these
schema documents AND NO OTHERS, and use the schema built from them,
without extensions."  At least, that's possible if the validator
provides user control over how schema documents are looked for.  Some
do, some don't; if you pay money for a validator, make sure it gives
you the kinds of control knobs you want to have.

 > XML Schema substitution groups are designed for this use case.

Yep.

 > Legend has it you tried to use them in XHTML modularization but it
 > didn't work out or something. We're interested to know the whole
 > story.

+1.

But it's probably worth pointing out that you, Dan, can as a user
extend the HTML schema as shown above, whether the HTML schema
documents use substitution groups or not.  (Unless, that is, the
schema author went out of the way to get in your way and close the
schema to this kind of extension.)

Full disclosure: sometimes extensibility as shown above is not quite
what you want.  Several things can go wrong; here are some of them.
(1) You want the my:box element to work just like an html:block, and
also like a MathML blort, and also like an SVG whammo element.

Sorry, in XML Schema 1.0, elements can only point to a single
substitution group head.  Your box element can be substitutable for
html:block, but not also for mathml:blort and svg:whammo.

Of course, if they are described as being substitutable for
html:block, things are slightly better.  If svg:whammo is
substitutable for html:block, then you can write

     <element name="box" substitutionGroup="svg:whammo"/>

and since substitution group membership is transitive (subject to
complex blocking rules which I can't explain and which you will never
encounter outside a markup pathology classroom), my:box is also
substitable for whatever svg:whammo is substitutable for.  If you know
that you'll always use my:box with SVG, then this transitive
membership is fine; otherwise, it is likely to strike you as not
solving your real problem, which is that you want multiple
substitution group affiliations for my:box.

Some people have urged that XML Schema 1.1 allow elements to have
multiple substitution-group heads.  It might happen.  Actually
(speaking for myself, not the WG) I think there's a very good chance
that it WILL happen.  But it hasn't happened yet; if you want it to
happen, tell the XML Schema WG.

(2) You want the my:box element to be substitutable for several
different elements, you use XSD 1.1 to get that functionality, and
when you specify

     <element name="box"
              substitutionGroup="html:block svg:whammo mathml:blort"/>

the processor rejects your schema because there is some point at which
EITHER an html:block element OR an svg:whammo element OR a
mathml:blort element may occur, and the processor can't decide which
part of the content model your my:box element belongs to.

Unfortunately, the user community of XSD 1.0 has not risen up and
demanded that XSD 1.1 eliminate the 'unique particle attribution
constraint' (aka 'UPA', aka the 'deterministic content model' rule,
which XSD took over from XML DTDs, and which XML DTDs inherited from
SGML, and for which no one has ever formulated a persuasive
rationale).  Pretty much the entire community of people interested in
document-oriented XML has said so, but to OO and Web Services people,
the use of XML for documents appears to represent an edge case that's
not worth worrying about.  So complaints about UPA have routinely been
dismissed as unimportant.  (This is perhaps one salient reason that
many schema authors prefer to work in Relax NG, which ditched the
determinism rule years ago.)

So while I expect the next draft of XSD 1.1 to have multiple
substitution-group heads, I don't expect it to have gotten rid of the
UPA constraint.  And some number of people who attempt to exploit
multiple substitution-group heads will find that UPA makes it
impossible to do so.  All I can say is: file bug reports.  Maybe
eventually the responsible WG will be responsive.

(3) You might want my:box to have a fresh, brand-new type of your own
devising, indpendent of and unrelated to the type assigned by the
schema to html:block.  If you do, you may be out of luck.  Some schema
authors will have chosen to design in extensibility points by
defining elements like

   <element name="block" abstract="true" type="anyType"/>

Since any type you can define is substitutable for xsd:anyType, this
kind of declaration gives you maximum freedom.

But other schema authors will have written

   <element name="block"
   type="my:block-type-so-specialized-no-one-else-can-use-it"/>

Since some members of the XSD 1.0 WG were very insistent on it, the
1.0 rule says that any element substitutable for 'block' must have
either the same type as 'block' or a type which is substitutable for
that of 'block'.  This makes element substitution groups feel a little
more like object inheritance classes, and it makes some sense in its
own way.  (If the OO people had been happier with this restriction,
I'd think it made sense.  But as a way of making XSD 1.0 work better
in OO terms, it seems to have had no effect at all.)

These problems do make the substitution groups of XSD 1.0 a little
less beautiful than I wish they were.  But there are a lot of cases
where substitution groups can be used without running into any of
these problems.  And where they work, I think substitution groups work
very nicely and could usefully be a lot more widely exploited.


--C. M. Sperberg-McQueen

Received on Thursday, 15 February 2007 22:59:04 UTC