Re: TAG Decision on Rescinding the request to the HTML WG to develop a polyglot guide from Sam Ruby on 2013-02-18 (public-html@w3.org from February 2013)

From: Sam Ruby <rubys@intertwingly.net>
Date: Mon, 18 Feb 2013 14:34:35 -0500
To: David Carlisle <davidc@nag.co.uk>
CC: public-html@w3.org
Message-ID: <5122824B.6040207@intertwingly.net>

On 02/18/2013 01:53 PM, David Carlisle wrote:
> On 18/02/2013 17:53, Sam Ruby wrote:
>> Boy I wish all "HTML parsers" supported unrestricted HTML syntax.
>> Henri's parser is better than tagsoup which is much better than libxml2.
>
> Yes I've seen that argument (and don't actually disagree with that
> either) but still the fact that html parsers are in fact buggy and you
> can get more consistent results using XML isn't really backing up the
> statement to which I was replying which was that the existence of the
> html parser in libxml2 was evidence of support for the need for
> polyglot. You are rather arguing that the existence of polyglot gives a
> mechanism of avoiding that parser in favour of the xml one.

Let me be clear on what I was disagreeing with:

"The HTML parser in
libxml2 (or tagsoup or Henri's validator.nu parsers in java) are exactly
the reason that some people (not entirely unreasonably) say that it
isn't needed. If you can put an HTML parser in front of an XML
tool-chain then you can pull in unrestricted HTML syntax and you have no
need to produce HTML documents following the polyglot guidelines"

I claim that the HTML parser in libxml2 is not sufficient to allow 
interop on "unrestricted HTML syntax", and therefore it is *exactly* the 
reason that there is value in a more "restricted HTML syntax".

Whether all of the existing restrictions in the Polyglot document are 
necessary, or indeed if they are sufficient, is the discussion we should 
be having.

> David

- Sam Ruby

Received on Monday, 18 February 2013 19:35:07 UTC