- From: Michael[tm] Smith <mike@w3.org>
- Date: Mon, 5 Nov 2012 16:46:36 +0900
- To: Jirka Kosek <jirka@kosek.cz>
- Cc: Lachlan Hunt <lachlan.hunt@lachy.id.au>, "public-html@w3.org" <public-html@w3.org>
Jirka Kosek <jirka@kosek.cz>, 2012-11-04 20:59 +0100: > And although I can be considered as very XML-biased I don't consider > Polyglot as a preferred syntax for general case. I would recommend it > only for very special scenarios. If you need to process HTML5 as XML it > is easier not to put additional burden on content producers and use > library like http://about.validator.nu/htmlparser/ to turn HTML into XML. Agreed. But note that technically it's not turning HTML into XML. What I mean is, e.g., if you use its SAX interface it's just exposing parsing events -- startElement, endElement, etc. And the handler for those doesn't need to know or care if they came from an HTML document or an XML document. Anyway, one problem currently is that a lot of people don't seem to know that the validator.nu HTML parser exists. Some seem to assume it's not even possible for such a parser to be used with an XML toolchain, and so some then end up advocating for polyglot and whatever else as the best-practice way to do things, when in fact the ideal best practice really ought to be that you're free to mark up your HTML document in whatever syntax you prefer -- the text/html one or the XML one -- and all your tools should be smart enough to consume it and handle it the way the validator.nu parser does. Another problem is that we don't yet have similar parser libraries for most other programming languages. That's a solvable problem, but I guess the solution needs to start with the people who develop and use XML toolchains in those languages. They need to realize it's possible to put a non-XML HTML parser in front of those, and understand the value of doing it. --Mike -- Michael[tm] Smith http://people.w3.org/mike
Received on Monday, 5 November 2012 07:46:50 UTC