Re: ISSUE-41: Decentralized extensibility

On Thu, 8 May 2008, David Orchard wrote:
> The HTML5 specification does not have a mechanism to allow decentralized 
> parties to create their own languages, typically XML languages, and 
> exchange them in HTML5 text/html serializations.

Indeed. This is by design.

> This would allow languages such as SVG, MathML, FBML and a host of 
> others to be included.  At one point, an editors version of the HTML5 
> specification contained a subset and reformulation of SVG and MathML.  
> Tim Berners-Lee described this incorporation of SVG and MathML without 
> namespaces as horrific and the issue raiser completely concurs with the 
> him.

The assumption that making HTML into a generic syntax is desireable, or 
that having generic syntaxes available for web authors to arbitrarily 
extend the Web platform with custom vocabularies is desireable, is not one 
that I agree with.

The Web platform is, frankly, too important to let people extend it 
without inviting the entire Web community to take part in the extension 
process. Allowing any vendor to extend the platform is how we end up with 
<blink>, <marquee>, or <layer>.

In practice there are very few vocabularies introduced to the platform 
over time, and the cost of adding new syntax each time has been utterly 
eclipsed by the cost of adding the functionality. For example, the total 
time spent on adding MathML to text/html was a month at most, compared to 
many years for designing MathML itself.

> This issue limits the ability of non-HTML5 working groups to define 
> languages as the languages must be "brought into" the HTML5 language.

Right, that's the idea.

> In the end, the problem could result in the text/html serialization 
> rules becoming the standard serialization rules for XML languages, 
> replacing XML itself. This could occur if every decentralized language 
> has a choice between the XML serialization, the text/html serialization 
> or both.  In many cases, the language may choose the text/html 
> serialization.

The HTML serialisation is not a generic syntax. It's a very vocabulary- 
specific syntax that has evolved organically through the involvement of 
multiple vendors and a lot of seemingly random chance. It's not a generic 
syntax like XML, and suggesting that XML could somehow be replaced by HTML 
is like saying that JSON could somehow be replaced by Python.

In addition to all the above, there is also a technical problem with the 
idea of adding a generic syntax to HTML. I'm honestly not sure it's 
possible. The Web is a unique ecosystem with adoption characteristics that 
tend to make this kind of thing hard to deploy. For example, scenarios 
like the following are common:

  We start our story with author A, browser B, and feature F.

  The spec introduces new feature F, which relies on there not being
  any content using the syntax of feature F already on the Web. (That
  already is hard to arrange, but lets assume for the purposes of this
  discussion that we could find some syntax that nobody had yet used.)

  Browser B implements F.

  Author A uses feature F on his site, a demo site for cutting edge
  features, testing with browser B. He also uses some other Cool
  things. Call the Cool things C.

  Now in our story we introduce another Web browser W and another
  developer D. Developer D looks at author A's site using browser W,
  and likes the cool things C that author A did. Cool things C work
  fine in Web browser W, although Feature F doesn't, and Web browser W
  ignores Feature F altogether. Developer D has no idea that Feature F
  exists, nor what it does, nor does he care. He does, however, like
  the Cool things C. He copies the code of Author A's site into his
  site. Developer D happens to run a big site, but he's not very
  good. He copies Feature F along with Cool things C, and mangles them
  a bit in the copying process as he adjusts Cool things C to work for
  his site. Developer D tests with Web browser W and all is great.

  Unbeknownst to Developer D, Browser B renders his site terribly,
  because Feature F inteferes with how Developer D intends his site to
  be processed.

  The implementors of Browser B end up forced to change their handling
  of Feature F, possibly removing it altogether. The spec has failed.

If you think this is farfetched, consider the random, incomplete, and 
ill-formed SVG and MathML fragments that already exist in text/html markup 
today, before Author A even has any reason to deploy SVG and MathML (aka 
Feature F) on his site.

With the MathML stuff in HTML5, the spec has been very carefully designed 
to have simple and effective "bail out" behaviour in case the scenario 
above happens. We can do that because MathML is a specific vocabulary that 
we can plan for. We don't need to be especially generic. We don't have to 
handle any random markup, only MathML and HTML mixed in specific ways.

I'm not convinced that it is possible to design a generic syntax that is 
resilient in the face of the above developer behaviour. Even if I thought 
that such generic syntax was desireable, we would need a very concrete 
proposal before even considering this.

As this is the editors' response to Issue 41, I have marked the issue 
closed, as recommended by the chairs. I presume this isn't going to 
satisfy you, but that you don't have anything further to say that hasn't 
already been said (after all, this discussion has been had to death over 
the past few years). I believe your next recourse if you want to override 
my proposal (rejecting the issue) is to ask the chairs to consider whether 
to bring this to a working group vote, but I could be wrong, I'm not sure.

(If you _do_ have new information that hasn't previously been brought 
forward on this issue, feel free to reopen the issue and mail this further 
information to the list.)

Ian Hickson               U+1047E                )\._.,--....,'``.    fL       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 23 May 2008 10:05:41 UTC