- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Tue, 14 Jul 2009 14:32:12 +0200
- To: Doug Schepers <schepers@w3.org>
- Cc: "public-html@w3.org" <public-html@w3.org>
Doug Schepers wrote: > To meet this need, I propose a new attribute, 'parsing', which, when > placed on the document root, defines the type of parsing which a UA must > use when parsing the document. The values would be "loose" and "strict", > with loose parsing as the default (an omitted @parsing attribute would > result in loose parsing). > > When the parsing is loose, the error-correction algorithms defined in > HTML5 must be applied; when the parsing is strict, there must be no > error-correction (as is commonly the case for XHTML in most browsers). I have a number of concerns with this proposal. It's not clear what you mean by "no error-correction" as it applies to HTML, and nor is it clear which parsing rules would need to be followed to achieve this. There are 2 of possibilities I can think of. Does it mean that, upon detection of the attribute, the browser must switch to an XML parser and reparse the document? If so, how is this different from simply serving the document as application/xhtml+xml? Or does it mean that the document must continue to be parsed by an HTML parser, except that the parser must abort at the first step defined as a parse error in either the tokenisation or tree construction phases, instead of following the prescribed error correction? Or does it mean something else? What happens if the parser encounters an error prior to parsing the root element, and continues normally, but then later reaches the root element and sees parsing=strict. e.g. Given the following erroneous input: <!DOCTYPE html x> <html parsing=strict> ... Should the browser remember that it previously encountered the error and retroactively abort? Then there's the problem of getting this deployed in browsers in practice. Given that each browser implements and ships features according to their own schedules, and user upgrade cycles can take even longer, there would be a long transition period during which some browsers do and others don't support this draconian parsing for HTML. This could lead to a situation where, for example, authors build and test their site locally and don't find any errors, and they leave the parsing=strict attribute present. Then, due to a bug in their CMS, some pages become non-well-formed due to some user input that wasn't properly sanitised. The affected pages would then break in the browsers that do support this new parsing mode, but continue to work fine in those that don't. So I share Maciej's concern about this triggering "a race to the bottom and neuter the feature". Personally, I think a better solution could be for browsers to allow developers to turn on this parsing mode manually for the sites they test, without needing to specify any attribute, or simply report the parse errors in their error console. -- Lachlan Hunt - Opera Software http://lachy.id.au/ http://www.opera.com/
Received on Tuesday, 14 July 2009 12:32:57 UTC