Re: Proposal: @parsing="loose | strict" from Leif Halvard Silli on 2009-07-14 (public-html@w3.org from July 2009)

From: Leif Halvard Silli <lhs@malform.no>
Date: Tue, 14 Jul 2009 15:13:03 +0200
To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
CC: Doug Schepers <schepers@w3.org>, "public-html@w3.org" <public-html@w3.org>
Message-ID: <4A5C845F.30004@malform.no>
Lachlan Hunt On 09-07-14 14.32:

> Doug Schepers wrote:
>> To meet this need, I propose a new attribute, 'parsing', which, when
>> placed on the document root, defines the type of parsing which a UA must
>> use when parsing the document. The values would be "loose" and "strict",
>> with loose parsing as the default (an omitted @parsing attribute would
>> result in loose parsing).
>>
>> When the parsing is loose, the error-correction algorithms defined in
>> HTML5 must be applied; when the parsing is strict, there must be no
>> error-correction (as is commonly the case for XHTML in most browsers).
> 
> I have a number of concerns with this proposal.
> 
> It's not clear what you mean by "no error-correction" as it applies to 
> HTML, 


Agree. Thus must be defined ... E.g. what about <p> without </p>?

> and nor is it clear which parsing rules would need to be followed 
> to achieve this.  There are 2 of possibilities I can think of.
> 
> Does it mean that, upon detection of the attribute, the browser must 
> switch to an XML parser and reparse the document?  If so, how is this 
> different from simply serving the document as application/xhtml+xml?

Difference: Today there is no way to do this without changing the 
MIME type. When using file URLs, this means changing the suffix of 
the file from .html to .xhtml. A parsing attribute or a CSS method 
(in combo with @media authoring{}) could let you switch more or 
less on the fly.

> Or does it mean that the document must continue to be parsed by an HTML 
> parser, except that the parser must abort at the first step defined as a 
> parse error in either the tokenisation or tree construction phases, 
> instead of following the prescribed error correction?


Indeed, this is not clear. But it seems most fruitful to say that 
xhtml+xml rules should apply.

 
> Or does it mean something else?
> 
> What happens if the parser encounters an error prior to parsing the root 
> element, and continues normally, but then later reaches the root element 
> and sees parsing=strict.  e.g. Given the following erroneous input:
> 
> <!DOCTYPE html x>
> <html parsing=strict>
> ...
> 
> Should the browser remember that it previously encountered the error and 
> retroactively abort?


If the feature was linked to the media type, namely to the a new 
authoring media type, then the UA would be able to catch it 
without any reparsing.

 
> Then there's the problem of getting this deployed in browsers in 
> practice.  Given that each browser implements and ships features 
> according to their own schedules, and user upgrade cycles can take even 
> longer, there would be a long transition period during which some 
> browsers do and others don't support this draconian parsing for HTML.
> 
> This could lead to a situation where, for example, authors build and 
> test their site locally and don't find any errors, and they leave the 
> parsing=strict attribute present.


The advantage of placing the strict parsing option in CSS would be 
that it then becomes an optioal feature, from beginning to end.

 
> Then, due to a bug in their CMS, some pages become non-well-formed due 
> to some user input that wasn't properly sanitised.  The affected pages 
> would then break in the browsers that do support this new parsing mode, 
> but continue to work fine in those that don't.  So I share Maciej's 
> concern about this triggering "a race to the bottom and neuter the 
> feature".


This, again, is yet another reason to place this option in CSS, 
and, by default, link it to a new media type for authoring tools.

 
> Personally, I think a better solution could be for browsers to allow 
> developers to turn on this parsing mode manually for the sites they 
> test, without needing to specify any attribute, or simply report the 
> parse errors in their error console.


Allowing authors/users to switch the media type identity of the UA 
would solve the problem.

--  

leif halvard silli
Received on Tuesday, 14 July 2009 13:14:00 UTC