Re: Proposal: @parsing="loose | strict" from Lachlan Hunt on 2009-07-14 (public-html@w3.org from July 2009)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Tue, 14 Jul 2009 16:03:15 +0200
To: Leif Halvard Silli <lhs@malform.no>
Cc: Doug Schepers <schepers@w3.org>, "public-html@w3.org" <public-html@w3.org>
Message-ID: <4A5C9023.5040802@lachy.id.au>
Leif Halvard Silli wrote:
> Lachlan Hunt On 09-07-14 14.32:
>
>> Doug Schepers wrote:
>>> To meet this need, I propose a new attribute, 'parsing', which, when
>>> placed on the document root, defines the type of parsing which a UA must
>>> use when parsing the document...
>>
>> and nor is it clear which parsing rules would need to be followed to
>> achieve this. There are 2 of possibilities I can think of...
>
> Indeed, this is not clear. But it seems most fruitful to say that
> xhtml+xml rules should apply.

Actually, I disagree with that.  If this proposal is about introducing 
XML parsing for text/html, then it really doesn't gain anything over 
using real XML, or at least using content negotiation to send 
application/xhtml+xml to browsers that support it and text/html 
otherwise.  It seems to me that this proposal would only be useful if it 
was some form of stricter HTML parsing, though I' still not convinced 
that this couldn't be addressed by a user controllable feature in browsers.

>> What happens if the parser encounters an error prior to parsing the
>> root element, and continues normally, but then later reaches the root
>> element and sees parsing=strict. e.g. Given the following erroneous
>> input:
>>
>> <!DOCTYPE html x>
>> <html parsing=strict>
>> ...
>>
>> Should the browser remember that it previously encountered the error
>> and retroactively abort?
>
> If the feature was linked to the media type, namely to the a new
> authoring media type, then the UA would be able to catch it without any
> reparsing.

The current suggestion is that this would be an attribute on the root 
element in the document, which is not diretly linked to the media type. 
  Are you suggesting that we instead do this with, for example, a new 
media type parameter for text/html?

e.g. Content-Type: text/html;charset=UTF-8;parsing=strict

And if we do that, and also apply XML parsing, then we really haven't 
gained anything over real XHTML.  (If we do that and, but just apply 
strict HTML parsing, then it could conceivably work, but still suffers 
from the practical problems of deployment that I mentioned.)

>> Then, due to a bug in their CMS, some pages become non-well-formed due
>> to some user input that wasn't properly sanitised. The affected pages
>> would then break in the browsers that do support this new parsing
>> mode, but continue to work fine in those that don't. So I share
>> Maciej's concern about this triggering "a race to the bottom and
>> neuter the feature".
>
> This, again, is yet another reason to place this option in CSS, and, by
> default, link it to a new media type for authoring tools.

Declaring this in CSS wouldn't work, since the parser would have to 
parse the HTML, find the <link> or <style> elements, stop and wait for 
the CSS parser to finish parsing the CSS and see if it found a parsing 
declaration in there, and if it did, start reparsing the document again 
with strict error handling enabled.

>> Personally, I think a better solution could be for browsers to allow
>> developers to turn on this parsing mode manually for the sites they
>> test, without needing to specify any attribute, or simply report the
>> parse errors in their error console.
>
> Allowing authors/users to switch the media type identity of the UA would
> solve the problem.

If you want XML parsing, authors already have the ability to set the 
media type sent by their server, or at least their testing server, or by 
simply changing the file extension.  There is also at least one Firefox 
extension that allows users to override media types.

https://addons.mozilla.org/en-US/firefox/addon/3207

-- 
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/
Received on Tuesday, 14 July 2009 14:04:21 UTC