- From: Leif Halvard Silli <lhs@malform.no>
- Date: Tue, 14 Jul 2009 15:13:03 +0200
- To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- CC: Doug Schepers <schepers@w3.org>, "public-html@w3.org" <public-html@w3.org>
Lachlan Hunt On 09-07-14 14.32:
> Doug Schepers wrote:
>> To meet this need, I propose a new attribute, 'parsing', which, when
>> placed on the document root, defines the type of parsing which a UA must
>> use when parsing the document. The values would be "loose" and "strict",
>> with loose parsing as the default (an omitted @parsing attribute would
>> result in loose parsing).
>>
>> When the parsing is loose, the error-correction algorithms defined in
>> HTML5 must be applied; when the parsing is strict, there must be no
>> error-correction (as is commonly the case for XHTML in most browsers).
>
> I have a number of concerns with this proposal.
>
> It's not clear what you mean by "no error-correction" as it applies to
> HTML,
Agree. Thus must be defined ... E.g. what about <p> without </p>?
> and nor is it clear which parsing rules would need to be followed
> to achieve this. There are 2 of possibilities I can think of.
>
> Does it mean that, upon detection of the attribute, the browser must
> switch to an XML parser and reparse the document? If so, how is this
> different from simply serving the document as application/xhtml+xml?
Difference: Today there is no way to do this without changing the
MIME type. When using file URLs, this means changing the suffix of
the file from .html to .xhtml. A parsing attribute or a CSS method
(in combo with @media authoring{}) could let you switch more or
less on the fly.
> Or does it mean that the document must continue to be parsed by an HTML
> parser, except that the parser must abort at the first step defined as a
> parse error in either the tokenisation or tree construction phases,
> instead of following the prescribed error correction?
Indeed, this is not clear. But it seems most fruitful to say that
xhtml+xml rules should apply.
> Or does it mean something else?
>
> What happens if the parser encounters an error prior to parsing the root
> element, and continues normally, but then later reaches the root element
> and sees parsing=strict. e.g. Given the following erroneous input:
>
> <!DOCTYPE html x>
> <html parsing=strict>
> ...
>
> Should the browser remember that it previously encountered the error and
> retroactively abort?
If the feature was linked to the media type, namely to the a new
authoring media type, then the UA would be able to catch it
without any reparsing.
> Then there's the problem of getting this deployed in browsers in
> practice. Given that each browser implements and ships features
> according to their own schedules, and user upgrade cycles can take even
> longer, there would be a long transition period during which some
> browsers do and others don't support this draconian parsing for HTML.
>
> This could lead to a situation where, for example, authors build and
> test their site locally and don't find any errors, and they leave the
> parsing=strict attribute present.
The advantage of placing the strict parsing option in CSS would be
that it then becomes an optioal feature, from beginning to end.
> Then, due to a bug in their CMS, some pages become non-well-formed due
> to some user input that wasn't properly sanitised. The affected pages
> would then break in the browsers that do support this new parsing mode,
> but continue to work fine in those that don't. So I share Maciej's
> concern about this triggering "a race to the bottom and neuter the
> feature".
This, again, is yet another reason to place this option in CSS,
and, by default, link it to a new media type for authoring tools.
> Personally, I think a better solution could be for browsers to allow
> developers to turn on this parsing mode manually for the sites they
> test, without needing to specify any attribute, or simply report the
> parse errors in their error console.
Allowing authors/users to switch the media type identity of the UA
would solve the problem.
--
leif halvard silli
Received on Tuesday, 14 July 2009 13:14:00 UTC