Re: Polyglot markup and authors from Eric J. Bowman on 2013-02-14 (www-tag@w3.org from February 2013)

From: Eric J. Bowman <eric@bisonsystems.net>
Date: Wed, 13 Feb 2013 17:09:40 -0700
To: Noah Mendelsohn <nrm@arcanedomain.com>
Cc: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Jirka Kosek <jirka@kosek.cz>, "Michael[tm] Smith" <mike@w3.org>, Sam Ruby <rubys@intertwingly.net>, Maciej Stachowiak <mjs@apple.com>, Paul Cotton <Paul.Cotton@microsoft.com>, Henri Sivonen <hsivonen@iki.fi>, public-html WG <public-html@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
Message-Id: <20130213170940.9a8696c064a031a768c36545@bisonsystems.net>

Noah Mendelsohn wrote:
> 
> So, Leif appears to be saying "polyglot is the best serialization for 
> all/most Web content"; the TAG is saying "there are important
> communities who will be well served by publishing the polyglot
> document as a recommendation". The TAG has not suggested that
> polyglot is preferable to more free form HTML5 in general.
> 

Perhaps the TAG should be.  Architecturally speaking, user-perceived
performance benefits when stream-processable media types are used.  I
understand why modern browsers need error-correcting HTML parsers, and
why younger developers feel they need such a device to shave bytes by
omitting end tags and such.

I, otoh, have been around long enough to understand through observation
that I get better deflate ratios from well-formed markup (something
about the redundancy of every opening tag having a closing tag, that
the algorithm just eats up); combine this with caching, and I don't see
the need to shave bytes in such fashion at all.

In fact, to do so would have a detrimental impact on my applications'
performance, since I'm depending on the stream-processability of the
application/xhtml+xml media type.  Error-correcting parsers can't begin
streaming output (using, say, SAX) to the rest of the toolchain until
they've finished parsing, unless they don't encounter any errors that
need correcting, like unclosed tags -- and even then, only maybe.

So, in my view, PG is indeed the "best" serialization for experienced
developers who care about user-perceived performance, as it provides us
a roadmap for generating HTML 5 such that it may be processed as a
stream by avoiding those situations where processing must be deferred
until parsing completes.  PG is therefore a benefit to the community,
while the "don't use polyglot, it's hard" advice would be a disservice.

http://charger.bisonsystems.net/conneg/

That would again be a link to my ageing demo which pre-dates the PG
document and illustrates not only an obscure problem or two, but that
those problems are solveable.  What the demo is meant to show, is how
I "shave bytes" (and reduce CPU requirements) by caching all the HTML
templating for an XML document collection on the client, using XSLT.

Of course this won't work for older browsers, but the current generation
makes it a no-brainer to offload CPU resources from the server to the
client.  Since it won't do to ignore older clients, the transformation
is done server-side for them, using the *same* XSLT URL other clients
have cached.  This real-world requirement is why polyglot is good -- it
allows a single XSLT file to be maintained, if that output may be
served with multiple media types (which my demo clearly shows, it can).

I will never be convinced to double the maintenance requirement for
this setup ("don't use polyglot") so long as there is a definable point
of convergence between HTML and XHTML. On the client, the reason you
don't notice any latency from the XSLT transformation on subsequent
requests is because the XSLT is cached after parsing and in some cases,
compiling.

>From there, it's because of stream processing that any modern browser
will begin displaying subsequent requests' content sooner than the same
browser requesting the text/html variant (using the handy menu and some
cookie magic).  So I'm not just shaving bytes, I'm improving the user
experience, as anyone can see for themselves by navigating my demo's
non-broken links both ways, using any browser.

The output from this XSLT is polyglot, as I expect it to render into
the same DOM regardless of whether the browser context is HTML or XML,
which I set with the appropriate media type.  If the official position
of the TAG becomes "don't do it this way," I'll disregard that advice
as coming from its Technical Adventure Group phase, and continue reaping
the server-resource-utilization benefits brought about simply by the
*existence* of HTML/XHTML convergence, PG document or no.

My demo may be ageing, but if lurking xsl-list is any indication, this
design pattern is only catching on (Michael Kay took some prodding from
me, but now serves his XML-based documentation this way) now that we
have both application/xhtml+xml and XSLT support in browsers, even if
it's only v1. The need to support both bots and IE 6, and thus the need
for polyglot for this use case, doesn't threaten to go away any time
soon.

Neither are collections of XML that people want transformed into HTML
*at the browser*, particularly if HTML is a "living standard" thus
making the template subject to change while the underlying data remains
static.  Unless these living standards "legislate out" all support for
these methods, as we're seeing with DOM 4; next I guess they'll be
removing support for XSLT from browsers, instead of updating it.  Until
such time, I'm happy to continue doing things "wrong" since it works so
well for me, my customers, and my users -- even Aunt Sally.

Better user-perceived performance is a feature of any architecture which
allows stream processing as an option.  Doing away with polyglot sends
the message that stream processing is not supported on the Web, clearly
not the case.  Just use the slower, less architecturally sound, error-
correcting parser or you'll hurt the HTML5 parser's feelings, doesn't
sound like technically solid advice in the face of the use case I keep
presenting -- serving XML collections as HTML without requiring that
the transformation be done at the server appeals to anyone lacking
Google-like resources to just throw more servers at the problem.

-Eric

Received on Thursday, 14 February 2013 00:10:16 UTC