Re: Proposal: @parsing="loose | strict"

Doug Schepers wrote:
> To meet this need, I propose a new attribute, 'parsing', which, when
> placed on the document root, defines the type of parsing which a UA must
> use when parsing the document. The values would be "loose" and "strict",
> with loose parsing as the default (an omitted @parsing attribute would
> result in loose parsing).
>
> When the parsing is loose, the error-correction algorithms defined in
> HTML5 must be applied; when the parsing is strict, there must be no
> error-correction (as is commonly the case for XHTML in most browsers).

I have a number of concerns with this proposal.

It's not clear what you mean by "no error-correction" as it applies to 
HTML, and nor is it clear which parsing rules would need to be followed 
to achieve this.  There are 2 of possibilities I can think of.

Does it mean that, upon detection of the attribute, the browser must 
switch to an XML parser and reparse the document?  If so, how is this 
different from simply serving the document as application/xhtml+xml?

Or does it mean that the document must continue to be parsed by an HTML 
parser, except that the parser must abort at the first step defined as a 
parse error in either the tokenisation or tree construction phases, 
instead of following the prescribed error correction?

Or does it mean something else?

What happens if the parser encounters an error prior to parsing the root 
element, and continues normally, but then later reaches the root element 
and sees parsing=strict.  e.g. Given the following erroneous input:

<!DOCTYPE html x>
<html parsing=strict>
...

Should the browser remember that it previously encountered the error and 
retroactively abort?

Then there's the problem of getting this deployed in browsers in 
practice.  Given that each browser implements and ships features 
according to their own schedules, and user upgrade cycles can take even 
longer, there would be a long transition period during which some 
browsers do and others don't support this draconian parsing for HTML.

This could lead to a situation where, for example, authors build and 
test their site locally and don't find any errors, and they leave the 
parsing=strict attribute present.

Then, due to a bug in their CMS, some pages become non-well-formed due 
to some user input that wasn't properly sanitised.  The affected pages 
would then break in the browsers that do support this new parsing mode, 
but continue to work fine in those that don't.  So I share Maciej's 
concern about this triggering "a race to the bottom and neuter the feature".

Personally, I think a better solution could be for browsers to allow 
developers to turn on this parsing mode manually for the sites they 
test, without needing to specify any attribute, or simply report the 
parse errors in their error console.

-- 
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/

Received on Tuesday, 14 July 2009 12:32:57 UTC