- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Thu, 15 Feb 2007 15:58:53 -0700
- To: Dan Connolly <connolly@w3.org>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, mimasa@w3.org, Mark Birbeck <mark.birbeck@x-port.net>, public-xml-versioning@w3.org
On 13 Feb 2007, at 10:13 , Dan Connolly wrote: > Mimasa, Shane, > I'm interested in a form of extensibility where a markup language > designer can make a new my:box element and say "it's an HTML block > element"; then, when a document containing a my:block element is > checked for syntactic happiness, the checking tool uses normal HTML > schemas until it gets to my:box; then it looks up my:box in the web, > finds that it's declared to be an HTML block, and find than an HTML > block is allowed here, and carries on happily. If we assume that my:box is in namespace http://example.com/mine, and (I have not checked) that HTML has an element (perhaps an abstract element) named 'box', then one way to do this is with this schema document, retrievable from the namespace URI (either directly or via RDDL). <schema xmlns ="http://www.w3.org/2001/XMLSchema" xmlns:my="http://example.com/mine" xmlns:html="http://www.w3.org/1999/xhtml" targetNamespace="http://example.com/mine" > <element name="box" substitutionGroup="html:block"/> </schema> A schema processor processing your enclosing document will see something like: <html xmlns:my="http://example.com/mine" ...> ... <div><h3>More details</h3> <p>What is really neat about this idea is:</p> <my:box> IT WORKS </my:box> <p>And what's more, it was my idea.</p> ... </html> and will know that in order to validate 'box' correctly, it's going to need to find a declaration. Unless you have instructed it otherwise, a typical processor will then look for schema components for the namespace http://example.com/mine. They might be hard-coded into the processor -- unlikely in this case. The user might have told the processor in advance to load components for that namespace from a particular URI -- probably more likely, but not what you are interested in, so we assume that didn't happen. Or they might be dereferenceable from the namespace URI. Since I'm assuming this is your namespace, and you are keen on making sure things can work using the follow-your-nose principle, let's assume the schema document above, or the equivalent, is at the namespace URI or pointed to from a RDDL document there. The schema processor reads it, and knows about my:box. It knows - There's a top-level element in namespace http://example.com/mine whose local name is 'box'. - That element wants to be allowed to appear wherever html:block can appear. - Its type is whatever the type of html:box is (you could have declared it with a restriction or an extension of that type, but in formulating the example you said "It's an HTML box" and nothing more; I take you at your word). Your schema processor can now validate the element. Without looking at the (X)HTML schema documents, I can't tell you whether your instance is now valid or not. It depends on how they are defined. Unless the schema author has taken active steps to get in your way, your document should be valid. Your schema document, together with the schema document(s) defining the other namespaces found in your document, creates a schema in which my:box acts like html:block, in having the same type and being legal in the same locations. If the author of the original schema wished to block this kind of extension, however, there are several ways it could be blocked. If the HTML schema you're validating with took the trouble to forbid the substitution of other elements for html:block, then your instance is invalid. If you restrict the type of block, or extend it, and the schema took the trouble to forbid restriction of the type, or extension of the type, or to allow the restriction or extension of the type itself but forbid the substitution of restrictions or extensions for instances of html:block, then your document is again invalid. Similarly, if an agent running a validator wishes to block this kind of extension, in order (for example) to tell whether your instance is legal against the HTML schema WITHOUT EXTENSIONS, there are some ways it can be blocked at validation time, too. In particular, the validator can be invoked with run-time options specifying "read these schema documents AND NO OTHERS, and use the schema built from them, without extensions." At least, that's possible if the validator provides user control over how schema documents are looked for. Some do, some don't; if you pay money for a validator, make sure it gives you the kinds of control knobs you want to have. > XML Schema substitution groups are designed for this use case. Yep. > Legend has it you tried to use them in XHTML modularization but it > didn't work out or something. We're interested to know the whole > story. +1. But it's probably worth pointing out that you, Dan, can as a user extend the HTML schema as shown above, whether the HTML schema documents use substitution groups or not. (Unless, that is, the schema author went out of the way to get in your way and close the schema to this kind of extension.) Full disclosure: sometimes extensibility as shown above is not quite what you want. Several things can go wrong; here are some of them. (1) You want the my:box element to work just like an html:block, and also like a MathML blort, and also like an SVG whammo element. Sorry, in XML Schema 1.0, elements can only point to a single substitution group head. Your box element can be substitutable for html:block, but not also for mathml:blort and svg:whammo. Of course, if they are described as being substitutable for html:block, things are slightly better. If svg:whammo is substitutable for html:block, then you can write <element name="box" substitutionGroup="svg:whammo"/> and since substitution group membership is transitive (subject to complex blocking rules which I can't explain and which you will never encounter outside a markup pathology classroom), my:box is also substitable for whatever svg:whammo is substitutable for. If you know that you'll always use my:box with SVG, then this transitive membership is fine; otherwise, it is likely to strike you as not solving your real problem, which is that you want multiple substitution group affiliations for my:box. Some people have urged that XML Schema 1.1 allow elements to have multiple substitution-group heads. It might happen. Actually (speaking for myself, not the WG) I think there's a very good chance that it WILL happen. But it hasn't happened yet; if you want it to happen, tell the XML Schema WG. (2) You want the my:box element to be substitutable for several different elements, you use XSD 1.1 to get that functionality, and when you specify <element name="box" substitutionGroup="html:block svg:whammo mathml:blort"/> the processor rejects your schema because there is some point at which EITHER an html:block element OR an svg:whammo element OR a mathml:blort element may occur, and the processor can't decide which part of the content model your my:box element belongs to. Unfortunately, the user community of XSD 1.0 has not risen up and demanded that XSD 1.1 eliminate the 'unique particle attribution constraint' (aka 'UPA', aka the 'deterministic content model' rule, which XSD took over from XML DTDs, and which XML DTDs inherited from SGML, and for which no one has ever formulated a persuasive rationale). Pretty much the entire community of people interested in document-oriented XML has said so, but to OO and Web Services people, the use of XML for documents appears to represent an edge case that's not worth worrying about. So complaints about UPA have routinely been dismissed as unimportant. (This is perhaps one salient reason that many schema authors prefer to work in Relax NG, which ditched the determinism rule years ago.) So while I expect the next draft of XSD 1.1 to have multiple substitution-group heads, I don't expect it to have gotten rid of the UPA constraint. And some number of people who attempt to exploit multiple substitution-group heads will find that UPA makes it impossible to do so. All I can say is: file bug reports. Maybe eventually the responsible WG will be responsive. (3) You might want my:box to have a fresh, brand-new type of your own devising, indpendent of and unrelated to the type assigned by the schema to html:block. If you do, you may be out of luck. Some schema authors will have chosen to design in extensibility points by defining elements like <element name="block" abstract="true" type="anyType"/> Since any type you can define is substitutable for xsd:anyType, this kind of declaration gives you maximum freedom. But other schema authors will have written <element name="block" type="my:block-type-so-specialized-no-one-else-can-use-it"/> Since some members of the XSD 1.0 WG were very insistent on it, the 1.0 rule says that any element substitutable for 'block' must have either the same type as 'block' or a type which is substitutable for that of 'block'. This makes element substitution groups feel a little more like object inheritance classes, and it makes some sense in its own way. (If the OO people had been happier with this restriction, I'd think it made sense. But as a way of making XSD 1.0 work better in OO terms, it seems to have had no effect at all.) These problems do make the substitution groups of XSD 1.0 a little less beautiful than I wish they were. But there are a lot of cases where substitution groups can be used without running into any of these problems. And where they work, I think substitution groups work very nicely and could usefully be a lot more widely exploited. --C. M. Sperberg-McQueen
Received on Thursday, 15 February 2007 22:59:04 UTC