Re: Polyglot Markup Formal Objection Rationale

Jirka Kosek <jirka@kosek.cz>, 2012-11-04 20:59 +0100:

> And although I can be considered as very XML-biased I don't consider
> Polyglot as a preferred syntax for general case. I would recommend it
> only for very special scenarios. If you need to process HTML5 as XML it
> is easier not to put additional burden on content producers and use
> library like http://about.validator.nu/htmlparser/ to turn HTML into XML.

Agreed. But note that technically it's not turning HTML into XML. What I
mean is, e.g., if you use its SAX interface it's just exposing parsing
events -- startElement, endElement, etc. And the handler for those doesn't
need to know or care if they came from an HTML document or an XML document.

Anyway, one problem currently is that a lot of people don't seem to know
that the validator.nu HTML parser exists. Some seem to assume it's not even
possible for such a parser to be used with an XML toolchain, and so some
then end up advocating for polyglot and whatever else as the best-practice
way to do things, when in fact the ideal best practice really ought to be
that you're free to mark up your HTML document in whatever syntax you prefer
-- the text/html one or the XML one -- and all your tools should be smart
enough to consume it and handle it the way the validator.nu parser does.

Another problem is that we don't yet have similar parser libraries for most
other programming languages. That's a solvable problem, but I guess the
solution needs to start with the people who develop and use XML toolchains
in those languages. They need to realize it's possible to put a non-XML
HTML parser in front of those, and understand the value of doing it.

  --Mike

-- 
Michael[tm] Smith http://people.w3.org/mike

Received on Monday, 5 November 2012 07:46:50 UTC