- From: Leif Halvard Silli <lhs@malform.no>
- Date: Tue, 14 Jul 2009 15:13:03 +0200
- To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- CC: Doug Schepers <schepers@w3.org>, "public-html@w3.org" <public-html@w3.org>
Lachlan Hunt On 09-07-14 14.32: > Doug Schepers wrote: >> To meet this need, I propose a new attribute, 'parsing', which, when >> placed on the document root, defines the type of parsing which a UA must >> use when parsing the document. The values would be "loose" and "strict", >> with loose parsing as the default (an omitted @parsing attribute would >> result in loose parsing). >> >> When the parsing is loose, the error-correction algorithms defined in >> HTML5 must be applied; when the parsing is strict, there must be no >> error-correction (as is commonly the case for XHTML in most browsers). > > I have a number of concerns with this proposal. > > It's not clear what you mean by "no error-correction" as it applies to > HTML, Agree. Thus must be defined ... E.g. what about <p> without </p>? > and nor is it clear which parsing rules would need to be followed > to achieve this. There are 2 of possibilities I can think of. > > Does it mean that, upon detection of the attribute, the browser must > switch to an XML parser and reparse the document? If so, how is this > different from simply serving the document as application/xhtml+xml? Difference: Today there is no way to do this without changing the MIME type. When using file URLs, this means changing the suffix of the file from .html to .xhtml. A parsing attribute or a CSS method (in combo with @media authoring{}) could let you switch more or less on the fly. > Or does it mean that the document must continue to be parsed by an HTML > parser, except that the parser must abort at the first step defined as a > parse error in either the tokenisation or tree construction phases, > instead of following the prescribed error correction? Indeed, this is not clear. But it seems most fruitful to say that xhtml+xml rules should apply. > Or does it mean something else? > > What happens if the parser encounters an error prior to parsing the root > element, and continues normally, but then later reaches the root element > and sees parsing=strict. e.g. Given the following erroneous input: > > <!DOCTYPE html x> > <html parsing=strict> > ... > > Should the browser remember that it previously encountered the error and > retroactively abort? If the feature was linked to the media type, namely to the a new authoring media type, then the UA would be able to catch it without any reparsing. > Then there's the problem of getting this deployed in browsers in > practice. Given that each browser implements and ships features > according to their own schedules, and user upgrade cycles can take even > longer, there would be a long transition period during which some > browsers do and others don't support this draconian parsing for HTML. > > This could lead to a situation where, for example, authors build and > test their site locally and don't find any errors, and they leave the > parsing=strict attribute present. The advantage of placing the strict parsing option in CSS would be that it then becomes an optioal feature, from beginning to end. > Then, due to a bug in their CMS, some pages become non-well-formed due > to some user input that wasn't properly sanitised. The affected pages > would then break in the browsers that do support this new parsing mode, > but continue to work fine in those that don't. So I share Maciej's > concern about this triggering "a race to the bottom and neuter the > feature". This, again, is yet another reason to place this option in CSS, and, by default, link it to a new media type for authoring tools. > Personally, I think a better solution could be for browsers to allow > developers to turn on this parsing mode manually for the sites they > test, without needing to specify any attribute, or simply report the > parse errors in their error console. Allowing authors/users to switch the media type identity of the UA would solve the problem. -- leif halvard silli
Received on Tuesday, 14 July 2009 13:14:00 UTC