Re: Proposal: @parsing="loose | strict"

Lachlan Hunt On 09-07-14 16.03:

> Leif Halvard Silli wrote:
>> Lachlan Hunt On 09-07-14 14.32:
>>> Doug Schepers wrote:
>>>> To meet this need, I propose a new attribute, 'parsing', which, when
>>>> placed on the document root, defines the type of parsing which a UA 
>>>> must use when parsing the document...
>>>
>>> and nor is it clear which parsing rules would need to be followed to
>>> achieve this. There are 2 of possibilities I can think of...
>>
>> Indeed, this is not clear. But it seems most fruitful to say that
>> xhtml+xml rules should apply.
> 
> Actually, I disagree with that.  If this proposal is about introducing 
> XML parsing for text/html, then it really doesn't gain anything over 
> using real XML, or at least using content negotiation to send 
> application/xhtml+xml to browsers that support it and text/html 
> otherwise.


Well, there is the advantage that it is much simpler, for an 
author, to change an attribute rather than having to fiddle with 
the file extension and/or content negotiation. (And if linked to 
the CSS/HTML media device types, one should be able to turn it 
on/off in/with the device.)

But may be we could have two variants of the strict mode: An 
XHTML variant and an  "strict HTML" variant.

>  It seems to me that this proposal would only be useful if it 
> was some form of stricter HTML parsing, though I' still not convinced 
> that this couldn't be addressed by a user controllable feature in browsers.


You mentioned relying on the browser's error console for this. 
Others have mentioned adding validation tools. But I think there 
is a slight difference between seeing the /effect/ of an error an 
getting a report about an error (that the UA perhaps "fixes" anyway.)

 
>>> What happens if the parser encounters an error prior to parsing the
>>> root element, and continues normally, but then later reaches the root
>>> element and sees parsing=strict. e.g. Given the following erroneous
>>> input:
>>>
>>> <!DOCTYPE html x>
>>> <html parsing=strict>
>>> ...
>>>
>>> Should the browser remember that it previously encountered the error
>>> and retroactively abort?
>>
>> If the feature was linked to the media type, namely to the a new
>> authoring media type, then the UA would be able to catch it without any
>> reparsing.
> 
> The current suggestion is that this would be an attribute on the root 
> element in the document, which is not diretly linked to the media type. 
>  Are you suggesting that we instead do this with, for example, a new 
> media type parameter for text/html?
> 
> e.g. Content-Type: text/html;charset=UTF-8;parsing=strict

No. Rather than anything that has to with the content type, we 
should look at the media device type - or "media mode" of the UA. 
In essence: We should look at the device. And identify/introduce 
the "authoring" media device type.

> And if we do that, and also apply XML parsing, then we really haven't 
> gained anything over real XHTML.  (If we do that and, but just apply 
> strict HTML parsing, then it could conceivably work, but still suffers 
> from the practical problems of deployment that I mentioned.)


Above you said that you are "still not convinced  that this 
couldn't be addressed by a user controllable feature in browsers".

This is in fact what I am proposing. Let Web browser have the 
option of switching to "authoring device" mode.

 
>>> Then, due to a bug in their CMS, some pages become non-well-formed due
>>> to some user input that wasn't properly sanitised. The affected pages
>>> would then break in the browsers that do support this new parsing
>>> mode, but continue to work fine in those that don't. So I share
>>> Maciej's concern about this triggering "a race to the bottom and
>>> neuter the feature".
>>
>> This, again, is yet another reason to place this option in CSS, and, by
>> default, link it to a new media type for authoring tools.
> 
> Declaring this in CSS wouldn't work, since the parser would have to 
> parse the HTML, find the <link> or <style> elements, stop and wait for 
> the CSS parser to finish parsing the CSS and see if it found a parsing 
> declaration in there, and if it did, start reparsing the document again 
> with strict error handling enabled.

I suppose you meant that a "CSS property alone wouldn't work". 
However, by CSS, I did not mean only a CSS property, but also a 
new CSS/HTML media device type (I did not mean a "MIME type"). The 
media type that we target via the CSS selector '@media' or the 
HTML media attribute.

The thing is that a media device of course treats the whole page 
in a certain way. This is programmed into the device before it 
starts reading the page. Thus, the device also has some predefined 
CSS properties.

If we had a "authoring device" media type (or a "authoring mode" 
sub media type), then we could have a default

 @media authoring{parsing:strict}

for those devices/modes. If UAs could /switch/ their media mode, 
then authors could also, with the touch of a button (which I think 
you proposed, anyway), see the page in "authoring mode".

>>> Personally, I think a better solution could be for browsers to allow
>>> developers to turn on this parsing mode manually for the sites they
>>> test, without needing to specify any attribute, or simply report the
>>> parse errors in their error console.
>>
>> Allowing authors/users to switch the media type identity of the UA would
>> solve the problem.
> 
> If you want XML parsing, authors already have the ability to set the 
> media type sent by their server, or at least their testing server, or by 
> simply changing the file extension.  There is also at least one Firefox 
> extension that allows users to override media types.
> 
> https://addons.mozilla.org/en-US/firefox/addon/3207

An interesting Firefox add-on (which I did not get to work). 
However, a new media device type is a more fundamental way of 
dealing with the problem, which also could be used to handle other 
authoring problems. (Such as the ability to have a special 
stylesheet for "authoring mode".)

As a CSS property it would of course also be possible (for 
security or other reason) to use

  @media all {parsing:strict}

Since, as you said, not all UAs would respect such property (until 
they support it), it feels meaningful to place it in CSS. Support 
for this property would then be an added benefit.
-- 
leif halvard silli

Received on Tuesday, 14 July 2009 18:24:20 UTC