Re: ISSUE-4 - versioning/DOCTYPEs

"Boris Zbarsky" <bzbarsky@MIT.EDU> wrote:

> On 5/16/10 10:22 AM, Daniel Glazman wrote:
> >> Hold on. We were just talking about wysiwyg HTML/XHTML editors,
> no?
> >> Those are very much NOT text editors.
> >
> > Guys, since you mentioned BlueGriffon, Nvu and Kompozer and since I
> > am
> > the original guilty one for these three editors, let me say a word.
> > Leif, what precisely do you miss? A dialog for polyglot documents
> > allowing to select the editing mode when a document is loaded? A
> > way to save a document in a given mimetype?
> I think what Leif would like is some way to indicate in-document that
> the document should be edited in polyglot mode so that all editors
> would automatically do that.

It's unclear to me what the use case is.

I'm aware of three use cases for polyglot documents:

 1) Serving XHTML+SVG or XHTML+MathML or XHTML+SVG+MathML content as application/xhtml+xml to Gecko, WebKit, Presto and Trident+MathPlayer but serving the same bytes as text/html to Trident (sans MathPlayer) in order to be able to use SVG and/or MathML inline where supported but allowing the users of unextended IE still read the (X)HTML content of the document.

 2) Serving application/xhtml+xml that doesn't use any non-HTML features as Gecko, WebKit and Presto as a matter of pro-XML principle but serving the same bytes to Trident as text/html because the author's pro-XML principle doesn't go far enough to exclude IE users from his/her audience.

 3) Serving content as text/html but using an XML parser to process the content in a non-browser scenario where the party operating the XML parser has the power to make the publisher supply the content in a form that is safe for XML parsers.

Leif, are there additional use cases that I'm missing?

Use case #3 is already obsolete. HTML parsers that expose XML-parser-compatible APIs are already available, so the content consumer should use an HTML parser instead of an XML parser. Since use case #3 is already obsolete, it's not useful to cater to the use case.

Use case #2 is harmful. When the document is well-formed, serving content as application/xhtml+xml to browsers deprives the users of optimizations that browsers have only for text/html. The well-known Gecko example used to be that the XML code path didn't support incremental rendering. That has been fixed, but currently in Gecko the XML code path doesn't benefit from speculative resource fetching. At least at one point in WebKit, the XML code path involved an additional UTF-16 to UTF-8 conversion and an additional UTF-8 to back to UTF-16 conversion of the content compared to the HTML code path. (I'm not sure if this is still the case in WebKit.) Worse, when the document isn't well-formed, users of application/xhtml+xml-capable browsers get an error message while IE users get to read the content. Since use case #2 is harmful, I think it is not useful to cater to the use case.

Use case #1 is on the way to obsolescence and while it's not obsolete yet, it is a specialist use case that affects very few people. This use case becomes obsolete either when IE versions prior to IE9 sink to low enough market share that authors no longer care (enabling the use of application/xhtml+xml unconditionally) or when versions of Gecko, WebKit and Presto that don't support SVG and MathML in text/html sink to low enough market share that authors no longer care (enabling the use of text/html unconditionally). My expectation is that WebKit and Presto implement the HTML5 parsing algorithm and the relatively fast upgrade cycle of Firefox, Opera, Chrome and Safari takes care of the old versions becoming irrelevant sooner than IE6 through IE8 become irrelevant to authors. As for the use case not being obsolete quite yet, currently this use case mainly applies to very few specialist blogs and the Venus aggregator. Even if (X)HTML editors had support for this use case, the required server configuration tweaks would keep deployment limited to specialists. While I think this use case is legitimate e.g. in the context of Jacques Distler's blog (, I think the WG shouldn't treat this use as something that J. Random Web Author is going to need support for or as something that J. Random Web Author should attempt.

In general, I get a feeling that polyglot documents have more intellectual appeal as a spec lawyering puzzle than they have practical usefulness. I think the WG shouldn't fall into the trap of chasing puzzle appeal instead of Solving Real Problems.

P.S. What does all this have to do with "versioning"? And "DOCTYPEs" in this context looks to me like a (bad) solution in search of a problem...

Henri Sivonen

Received on Monday, 17 May 2010 08:58:17 UTC