- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Sun, 11 May 2008 20:09:06 +0300
- To: HTML Issue Tracking WG <public-html@w3.org>
On May 8, 2008, at 18:20 , HTML Issue Tracking Issue Tracker wrote: > ISSUE-41 (Dave Orchard): Decentralized extensibility > > http://www.w3.org/html/wg/tracker/issues/ > > Raised by: David Orchard > On product: > > The HTML5 specification does not have a mechanism to allow > decentralized parties to create their own languages, typically XML > languages, and exchange them in HTML5 text/html serializations. > This would allow languages such as SVG, MathML, FBML and a host of > others to be included. At one point, an editors version of the > HTML5 specification contained a subset and reformulation of SVG and > MathML. Tim Berners-Lee described this incorporation of SVG and > MathML without namespaces as horrific and the issue raiser > completely concurs with the him. > > This issue limits the ability of non-HTML5 working groups to define > languages as the languages must be "brought into" the HTML5 > language. This dramatically increases the scope of HTML5 and > decreases the ability to modularize development of orthogonal > languages. > > In the end, the problem could result in the text/html serialization > rules becoming the standard serialization rules for XML languages, > replacing XML itself. This could occur if every decentralized > language has a choice between the XML serialization, the text/html > serialization or both. In many cases, the language may choose the > text/html serialization. I discussed this with Dave in Dublin this past week. Here's a write-up of my points (with elaborations) for WG members who weren't attending the same dinner: I think there are three kinds of HTML extensions: 1) Extensions to the feature set of browser engines. Example: <canvas> 2) Extensions that are meant to be used in documents consumed by browsers but the extensions have value even when not acted on by browsers: Example: hCal. 3) Extensions that are not meant to be used in documents consumed by browsers. Example: FBML. I think case #1 and #3 are easier to tackle than #2, so I discuss #1 and #3 first. Case #1 -- Extensions to the feature set of browser engines Extensions to the feature set of browser engines quite obviously require modifications to the engines. Since modifications to the engines are needed anyway, modifications to the parser could be made as well. Moreover,to implement <canvas>, SVG, MathML, Web Forms 2.0 or ARIA, the amount of work needed on code above the parser layer by far exceeds the amount of work that would go into tweaking the parser if the extension mechanism for this class of extensions weren't generic enough not to require parser changes. Therefore, I think trying to establish a framework that guaranteed that we'd never have to tweak parsing would be the wrong optimization. However, when an extension can be done without tweaking the parser, not tweaking the parser is preferable (hence, aria-foo). Even if parsing is tweaked, it is crucial not to tweak parsing in a way that interfered with existing Web content too much. This dramatically limits what kind of syntax extensions can use. Extensions to the feature set of browser engines, if successful, stop being "extensions" when considered at a future date. From a future point of view, the features have become part of the core feature set. Also, the time from being considered an "extension" to being considered "in the core" is likely shorter than the time from thereon until the sunset of that part of Web technology. Once a feature has become part of the core, Web authors using the feature shouldn't have to care where it came from. Moreover, they shouldn't have to deal with cruft like namespace URIs or prefixes. I think having <canvas> is much better than having <apple:canvas>. As a case study, <canvas> isn't flawless. So what went wrong? At first, Apple made it a void element but then others wanted to make it a container. Apple then had to make an incompatible change to their implementation to align with others. This could have been avoided by allowing the design to be reviewed by interested parties before shipping a product with the feature. It should also be noted that the number of parties who effectively have the power to extend the browser engine feature set is fairly small. (I'd say 4 at the moment.) A small number of parties can take names from a single pool on a first-come-first-served basis. This seems repulsive to some, as it raises the question of name squatting. However, in practice to claim a name on the Web is to successfully implement uses for a name. URI-based extensibility isn't immune to this phenomenon: If browsers implement certain behavior for a given namespace, but a WG who claims the namespace in a de jure way writes a contradictory spec, the spec is pretty toothless in practice. Conclusion: This extension case doesn't need a technical mechanism. It needs a social mechanism that allows would-be extenders to bring their intent to extend forward for community review early in the process. (The mechanism can be posting to public-html or to the WHATWG mailing list.) (I don't find adding MathML and SVG parsing to text/html as special cases horrific.) Case #3 -- Extensions that are not meant to be used in documents consumed by browsers When documents aren't meant for browsers, we can get rid of a lot of backwards-compatibility baggage. On the other hand, this kind of extensions are likely to be more abundant than extensions to browser engines, because product-specific templating systems outnumber browsers. Furthermore, templating systems may want to be able to put individual HTML elements in contexts that are incompatible with text/ html parsing. Consider for example templating syntax between <table> and its <tr>s. Since these cases are likely to outnumber browsers and to be more frequent than extensions to browser engines, being able to use off-the- shelf software without modification becomes a more attractive optimization point while at the same time the parser doesn't need to be Web-compatible. Conclusion: It seems to me that mixing XHTML5 with product-specific elements and attributes in XML is a good fit for this extension case. No new extension mechanism is needed, since Namespaces in XML are available. Objection: XML is too hard because it's Draconian. Answer: Then we need non-Draconian XML5. Follow-up objection: Want one syntax instead of two (HTML5 and XML5). Answer: Too bad but text/html backwards compatibility and being able to represent all XML 1.0 infosets (needed for replacing XML) are conflicting goals. Proof of conflict: 1) text/html compatibility requires the parser to infer an <html> root. 2) There are XML vocabularies whose root is not <html>, such as SVG. 3) Q.E.D. Case #2 -- Extensions that are meant to be used in documents consumed by browsers but the extensions have value even when not acted on by browsers This is the hard case. The would-be extenders are more numerous than sources of browser engines. The would-be extenders can't change the way HTML is parsed. I don't feel as sure about presenting a solution for this case as I am for the other two cases. I will, however, observe that regardless of whether URI-based extensibility is used, a pattern of using attributes is emerging in order not to interfere with element parsing: without URI-based extensibility being microformats and with URI-based extensibility being RDFa. No conclusion, but this does suggest to me that the answer should be: Put this kind of extensions in attributes. (New attributes, magic values or both.) I'm not convinced that URI-based extensibility or similar is needed. To me it seems that it's sufficient for each extension to be have a gestalt that is recognizable with adequate probability. It seems unlikely that a valid hCal entry occurred by random chance. (rel='me' occurring by random chance seems a bit more likely, but I don't really worry about it.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Sunday, 11 May 2008 17:09:49 UTC