- From: Thomas Broyer <t.broyer@ltgt.net>
- Date: Wed, 7 Jan 2009 17:45:50 +0100
- To: public-html <public-html@w3.org>
On Wed, Jan 7, 2009 at 1:43 PM, Julian Reschke wrote: > > Ian Hickson wrote: >> >> (There are a number of things that XML can't do because of its limitations >> in extensibility. For example, authors can't extend it to represent non-tree >> structures, they can't extend it to have error recovery, they can't extend >> it it to have true multivalued-attributes, they can't extend it to allow >> them to correctly define validity in the face of namespaces, and they can't >> extend it to allow them to define validity for non-enumerated attribute >> values. This isn't a criticism of XML, it's just a description of the design >> choices made by the XML working group. It's normal for a language to have a >> constrained extensibility model.) > > All true. > > But in XML based languages you can extend the vocabulary, Only when the vocabulary has been defined to be extensible, otherwise your document won't validate (DTDs do not allow plugging in attributes other than defined ones and only allow "foreign" child elements when the content model is ANY; it's almost the same with XML Schema except you can opt-in for foreign attributes --eventually constrained by namespace-- and allow foreign child elements while still validating/constraining other child elements; again almost the same with RelaxNG, with added expressiveness re. deterministic vs. ambiguous content models). As an example, Atom explicitly allows (i.e. not flag as an error) any attribute and/or element not defined in the spec; and further defines specific extensibility points (so that "generic" Atom processor could map those to internal models different from Infoset). XML in itself does not make vocabularies extensible in any way (even in the absence of a DTD, processing of an "unknown" attribute or element, or an unknown attribute value or element content, or CDATA/PCDATA found where it's not expected, is left totally unspec'd, they are the responsibility of vocabulary definitions, and most of them do not allow "foreign content/metadata"; this includes XHTML 1.x and XHTML 2.0). What XML allows however (but only when you add Namespaces for XML) is reusing pieces of already defined vocabularies to build new ones (Open Document, XHTML 2 reusing XForms, etc.) (well, it all depends what you call a "vocabulary") > and this you can't in HTML. At least not the way it's currently defined. Because XML syntax is "self-expressive", but that's not the case for HTML right now (it depends on the vocabulary: void elements, special/scoping/formatting/phrasing elements). I don't know SGML much but it seems no more different than optional tags being defined in the DTD: if HTML5 were still SGML-based, when you'd add a new element (particularly a new "void element": EMPTY content model with optional end tag), you'd have to update the DTD, and because no one would download DTDs but use their local catalogs, you'd have the same deployment problem. I agree that this is not ideal that any future HTML version (including HTML5) introducing a new void element would introduce a discrepency in document processing (HTML6 documents using those new void elements cannot be used with an HTML5 parser/processor; or at least they may be parsed to different DOMs). As already mentionned, one thing we could do is prohibiting introduction of void elements, but a) as Ian said it would make things harder to read (<command></command>) and b) it would not address all use cases, as you would also need to prohibit introduction of new scoping/formatting/phrasing elements, and that's probably not desired (we want <section> to auto-close any opened <p>, but for compatibility with non-HTML5 UAs, authors still have to explicitly close their <p> before opening a new <section>). So, as a "compatibility measure", authors would probably have to use <newvoidelt></newvoidelt>. The HTML5 parsing algorithm should eventually not flag this as a parse error (I guess it currently is a parse error); or at least validators flag it as a "warning" or "info" rather than "error". But in 20 years from now, all UAs (at least browsers) will probably be HTML5-compliant and documents produced at that date will be able to use <newvoidelt> or <newvoidelt/> without fearing incompatibilities. This is quite the same as the <script><!-- ... //--></script> and <style><!-- ... --></style> syntax that worked-around old browsers that would otherwise have shown the script and stylesheet in plain text, but are now totally useless (and harmful when users try to switch to XHTML where this will effectively hide things into comments) Maybe instead of this debate on "theories", we should first investigate what authors need to do to preserve compatibility when such new elements are introduced. And we can do this right now without "predictions" of future needs, as HTML5 introduces void elements (<command> and <source> come to mind) and "special" elements (<section> for instance, see above). Moreover, that work is *needed* for anyone who wants to use those new elements (and omit some optional tags, in the case of <section>). It should be quite easy to modify html5lib for those tests to disable special processing for these elements (falling back to the "any other start tag" and "any other end tag" cases in the tree builder algorithm instead). -- Thomas Broyer
Received on Wednesday, 7 January 2009 16:46:31 UTC