- From: Eric J. Bowman <eric@bisonsystems.net>
- Date: Wed, 13 Feb 2013 17:09:40 -0700
- To: Noah Mendelsohn <nrm@arcanedomain.com>
- Cc: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Jirka Kosek <jirka@kosek.cz>, "Michael[tm] Smith" <mike@w3.org>, Sam Ruby <rubys@intertwingly.net>, Maciej Stachowiak <mjs@apple.com>, Paul Cotton <Paul.Cotton@microsoft.com>, Henri Sivonen <hsivonen@iki.fi>, public-html WG <public-html@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
Noah Mendelsohn wrote: > > So, Leif appears to be saying "polyglot is the best serialization for > all/most Web content"; the TAG is saying "there are important > communities who will be well served by publishing the polyglot > document as a recommendation". The TAG has not suggested that > polyglot is preferable to more free form HTML5 in general. > Perhaps the TAG should be. Architecturally speaking, user-perceived performance benefits when stream-processable media types are used. I understand why modern browsers need error-correcting HTML parsers, and why younger developers feel they need such a device to shave bytes by omitting end tags and such. I, otoh, have been around long enough to understand through observation that I get better deflate ratios from well-formed markup (something about the redundancy of every opening tag having a closing tag, that the algorithm just eats up); combine this with caching, and I don't see the need to shave bytes in such fashion at all. In fact, to do so would have a detrimental impact on my applications' performance, since I'm depending on the stream-processability of the application/xhtml+xml media type. Error-correcting parsers can't begin streaming output (using, say, SAX) to the rest of the toolchain until they've finished parsing, unless they don't encounter any errors that need correcting, like unclosed tags -- and even then, only maybe. So, in my view, PG is indeed the "best" serialization for experienced developers who care about user-perceived performance, as it provides us a roadmap for generating HTML 5 such that it may be processed as a stream by avoiding those situations where processing must be deferred until parsing completes. PG is therefore a benefit to the community, while the "don't use polyglot, it's hard" advice would be a disservice. http://charger.bisonsystems.net/conneg/ That would again be a link to my ageing demo which pre-dates the PG document and illustrates not only an obscure problem or two, but that those problems are solveable. What the demo is meant to show, is how I "shave bytes" (and reduce CPU requirements) by caching all the HTML templating for an XML document collection on the client, using XSLT. Of course this won't work for older browsers, but the current generation makes it a no-brainer to offload CPU resources from the server to the client. Since it won't do to ignore older clients, the transformation is done server-side for them, using the *same* XSLT URL other clients have cached. This real-world requirement is why polyglot is good -- it allows a single XSLT file to be maintained, if that output may be served with multiple media types (which my demo clearly shows, it can). I will never be convinced to double the maintenance requirement for this setup ("don't use polyglot") so long as there is a definable point of convergence between HTML and XHTML. On the client, the reason you don't notice any latency from the XSLT transformation on subsequent requests is because the XSLT is cached after parsing and in some cases, compiling. >From there, it's because of stream processing that any modern browser will begin displaying subsequent requests' content sooner than the same browser requesting the text/html variant (using the handy menu and some cookie magic). So I'm not just shaving bytes, I'm improving the user experience, as anyone can see for themselves by navigating my demo's non-broken links both ways, using any browser. The output from this XSLT is polyglot, as I expect it to render into the same DOM regardless of whether the browser context is HTML or XML, which I set with the appropriate media type. If the official position of the TAG becomes "don't do it this way," I'll disregard that advice as coming from its Technical Adventure Group phase, and continue reaping the server-resource-utilization benefits brought about simply by the *existence* of HTML/XHTML convergence, PG document or no. My demo may be ageing, but if lurking xsl-list is any indication, this design pattern is only catching on (Michael Kay took some prodding from me, but now serves his XML-based documentation this way) now that we have both application/xhtml+xml and XSLT support in browsers, even if it's only v1. The need to support both bots and IE 6, and thus the need for polyglot for this use case, doesn't threaten to go away any time soon. Neither are collections of XML that people want transformed into HTML *at the browser*, particularly if HTML is a "living standard" thus making the template subject to change while the underlying data remains static. Unless these living standards "legislate out" all support for these methods, as we're seeing with DOM 4; next I guess they'll be removing support for XSLT from browsers, instead of updating it. Until such time, I'm happy to continue doing things "wrong" since it works so well for me, my customers, and my users -- even Aunt Sally. Better user-perceived performance is a feature of any architecture which allows stream processing as an option. Doing away with polyglot sends the message that stream processing is not supported on the Web, clearly not the case. Just use the slower, less architecturally sound, error- correcting parser or you'll hurt the HTML5 parser's feelings, doesn't sound like technically solid advice in the face of the use case I keep presenting -- serving XML collections as HTML without requiring that the transformation be done at the server appeals to anyone lacking Google-like resources to just throw more servers at the problem. -Eric
Received on Thursday, 14 February 2013 00:10:18 UTC