- From: Eric J. Bowman <eric@bisonsystems.net>
- Date: Thu, 14 Jul 2011 10:20:55 -0600
- To: Robin Berjon <robin@berjon.com>
- Cc: Larry Masinter <masinter@adobe.com>, "www-tag@w3.org List" <www-tag@w3.org>
Robin Berjon wrote: > > Likewise polyglot can be useful in some cases. But it's not a general > solution today, and if we're going to commit to the cost of making it > possible for it to be a general solution then it better bring value > that makes that work worth it compared to the existing solution which > is to just grab a parser off GitHub. > Disagree. HTML wrapped in Atom serialized as XML to allow Xpath access into the wrapped content, is quite a common design pattern and requires the markup to be polyglot. I've been advocating the polyglot approach for a long time, now (just not calling it that). My advice to everyday Web developers is to use XHTML 1.1 and application/xhtml+xml to develop sites, and a few lines of XSLT to convert to HTML 4.01 or XHTML 1.0 for publication as text/ html. If it validates as XHTML 1.1 (sans Ruby), it'll be valid otherwise, while forcing conformance with certain best practices like avoiding the use of @style, and can be parsed by many XML parsers which accept text/html provided it's well-formed (with the exception of self- closing tags); libxml comes to mind, as does Resin's built-in XML. This approach doesn't work when using elements introduced by HTML 5. But, since I'm using oXygen, it's a simple matter to remove the DOCTYPE and extend whatever schema I'm using (I usually include Xforms) to account for new elements. So I can still generate HTML 5 using a pure- XML toolchain. In much of my work, I'm generating (X)HTML from Atom using browser-resident XSLT, so updating existing websites to use HTML 5 when the time comes should be a simple matter of editing that XSLT. I'll still be serving XHTML stub files as application/xhtml+xml to trigger the XSLT from XML PIs, regardless of the output serialization. I don't see where an HTML parser needs to enter into it, except of course for the browser, but I do see a considerable cost if oXygen and every other XML-based toolchain used to maintain the installed base of XML is required to increase its complexity with another parser for the same markup. Polyglot makes sense, as I'm hardly alone in using Atom as a wrapper for HTML content, serialized as XHTML so I don't lose Xpath access into that content. The complexity and latency of applications I've built this way go up considerably, if I need to call an HTML parser before I can access the Atom-wrapped markup with Xpath -- how, in an XSLT file, do I instruct the browser to parse that wrapped markup with an HTML 5 parser before reading it into the XSLT processor? Or am I a bad Web developer because I didn't just use Javascript? ;-) I'd prefer if the Web, moving forwards, didn't exclude (requiring two parses of every file I read would make my architecture untenable) perfectly legitimate application architectures as punishment to those of us who insist on using XML toolchains in defiance of browser vendors' opinions on the matter. > > It might hypothetically be possible to craft a GIF such that it would > decode in a PNG processor. There may even be cases in which that's > useful. But trying to marry them doesn't seem useful. > Apples and oranges. Was I using the wrong tool, all those years I was publishing HTML created with markup-unaware text editors? One website I did for a Fortune 500 outfit required me to train a couple of their folks to maintain certain pages. Unexpectedly, when I mentioned that HTML was a subset of SGML, they knew what I was talking about and it so happened that they already knew how to do bold/italic/list/h markup. Would it be the wrong tool to use a SGML editor to create HTML? The polyglot concept is as old as the Web, and I fail to see the technical requirement for abandoning the notion that HTML may be parsed/written using a variety of tools. -Eric
Received on Thursday, 14 July 2011 16:21:47 UTC