W3C home > Mailing lists > Public > public-html@w3.org > January 2013

Re: The non-polyglot elephant in the room

From: Sam Ruby <rubys@intertwingly.net>
Date: Mon, 21 Jan 2013 09:56:37 -0500
Message-ID: <50FD5725.4090408@intertwingly.net>
To: public-html@w3.org
On 01/21/2013 09:24 AM, Michael[tm] Smith wrote:
> And I think some of the people who
> advocated for requiring XHTML didn't understand that existing XML-based
> toolchains could be made to handle text/html content just by putting an
> HTML parser in front of them.

My experience is that HTML parsers vary wildly in quality and 
performance, and that high performance quality HTML5 compliant parsers 
are far from ubiquitous.

I've been told that that will be solved over time.  I've been told that 
over a long period of time.  So far, that has not proven to be true.

I fully acknowledge Anne's position[1] that if you are in a position 
where you have a need to robustly parse data which purports to be to 
HTML or even XHTML, you are best served by employing a HTML 5 compliant 
parser.

I continue to maintain that the complementary position is also true: 
namely that if you have a need to produce data which may subsequently be 
parsed by parsers that purport to be HTML parsers, what currently is 
captured by the polyglot specification is your best bet.

It appears that even Henri is in favor of adding a validator option for 
flagging implied tags.[2]

People talk at length about the "wasted" developer time that is spent on 
polyglot.  From my perspective, if but a fraction of the energy spent on 
trying to stop this effort were instead spent on either improving 
parsing tools like libxml2 or on determining what a simplified and more 
robust subset of HTML5 would look like, then we could make better 
progress on this issue.

- Sam Ruby

[1] http://lists.w3.org/Archives/Public/public-html/2013Jan/0062.html
[2] http://lists.w3.org/Archives/Public/public-html/2012Dec/0071.html
Received on Monday, 21 January 2013 14:57:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 21 January 2013 14:57:14 GMT