Re: Revised HTML/XML Task Force Report from Eric J. Bowman on 2011-07-14 (www-tag@w3.org from July 2011)

From: Eric J. Bowman <eric@bisonsystems.net>
Date: Thu, 14 Jul 2011 10:20:55 -0600
To: Robin Berjon <robin@berjon.com>
Cc: Larry Masinter <masinter@adobe.com>, "www-tag@w3.org List" <www-tag@w3.org>
Message-Id: <20110714102055.d1a3ee45.eric@bisonsystems.net>
Robin Berjon wrote:
> 
> Likewise polyglot can be useful in some cases. But it's not a general
> solution today, and if we're going to commit to the cost of making it
> possible for it to be a general solution then it better bring value
> that makes that work worth it compared to the existing solution which
> is to just grab a parser off GitHub.
> 

Disagree.  HTML wrapped in Atom serialized as XML to allow Xpath access
into the wrapped content, is quite a common design pattern and requires
the markup to be polyglot.

I've been advocating the polyglot approach for a long time, now (just
not calling it that).  My advice to everyday Web developers is to use
XHTML 1.1 and application/xhtml+xml to develop sites, and a few lines
of XSLT to convert to HTML 4.01 or XHTML 1.0 for publication as text/
html.  If it validates as XHTML 1.1 (sans Ruby), it'll be valid
otherwise, while forcing conformance with certain best practices like
avoiding the use of @style, and can be parsed by many XML parsers which
accept text/html provided it's well-formed (with the exception of self-
closing tags); libxml comes to mind, as does Resin's built-in XML.

This approach doesn't work when using elements introduced by HTML 5.
But, since I'm using oXygen, it's a simple matter to remove the DOCTYPE
and extend whatever schema I'm using (I usually include Xforms) to
account for new elements.  So I can still generate HTML 5 using a pure-
XML toolchain.  In much of my work, I'm generating (X)HTML from Atom
using browser-resident XSLT, so updating existing websites to use HTML
5 when the time comes should be a simple matter of editing that XSLT.
I'll still be serving XHTML stub files as application/xhtml+xml to
trigger the XSLT from XML PIs, regardless of the output serialization.

I don't see where an HTML parser needs to enter into it, except of
course for the browser, but I do see a considerable cost if oXygen and
every other XML-based toolchain used to maintain the installed base of
XML is required to increase its complexity with another parser for the
same markup.  Polyglot makes sense, as I'm hardly alone in using Atom as
a wrapper for HTML content, serialized as XHTML so I don't lose Xpath
access into that content.  The complexity and latency of applications
I've built this way go up considerably, if I need to call an HTML
parser before I can access the Atom-wrapped markup with Xpath -- how,
in an XSLT file, do I instruct the browser to parse that wrapped markup
with an HTML 5 parser before reading it into the XSLT processor?

Or am I a bad Web developer because I didn't just use Javascript?  ;-)
I'd prefer if the Web, moving forwards, didn't exclude (requiring two
parses of every file I read would make my architecture untenable)
perfectly legitimate application architectures as punishment to those
of us who insist on using XML toolchains in defiance of browser
vendors' opinions on the matter.

> 
> It might hypothetically be possible to craft a GIF such that it would
> decode in a PNG processor. There may even be cases in which that's
> useful. But trying to marry them doesn't seem useful.
> 

Apples and oranges.  Was I using the wrong tool, all those years I was
publishing HTML created with markup-unaware text editors?  One website
I did for a Fortune 500 outfit required me to train a couple of their
folks to maintain certain pages.  Unexpectedly, when I mentioned that
HTML was a subset of SGML, they knew what I was talking about and it so
happened that they already knew how to do bold/italic/list/h markup.
Would it be the wrong tool to use a SGML editor to create HTML?  The
polyglot concept is as old as the Web, and I fail to see the technical
requirement for abandoning the notion that HTML may be parsed/written
using a variety of tools.

-Eric
Received on Thursday, 14 July 2011 16:21:47 UTC