Re: [whatwg] Should ambiguous ampersand be a parse error? from Ian Hickson on 2014-01-22 (public-whatwg-archive@w3.org from January 2014)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 22 Jan 2014 21:48:06 +0000 (UTC)
To: Boris Zbarsky <bzbarsky@MIT.EDU>, "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Cc: whatwg@lists.whatwg.org
Message-ID: <alpine.DEB.2.00.1401222138370.26647@ps20323.dreamhostps.com>

On Tue, 10 Dec 2013, Boris Zbarsky wrote:
> On 12/10/13 11:11 AM, Peter Cashin wrote:
> > 
> > Is the specification intended to have compliant HTML agents stop 
> > parsing ambiguous ampersands?
> 
> Compliant HTML agents are allowed to do so, I guess, per the technical 
> rules about parse errors, just like for any other parse error.  But I 
> expect that this is at least partly for conformance classes other than 
> "browsers"; all browsers press on through parse errors in HTML.  Maybe 
> the allowed behavior for parse errors should be made conditional on 
> conformance class...

While I agree that it's unlikely that any browser will ever make use of 
this in its default mode, I've still allowed it, because it can be a 
useful mode to use in an authoring or educational environment.

On Tue, 10 Dec 2013, Jukka K. Korpela wrote:
> 
> Authoring requirements as such are just policy statements, therefore 
> regularly ignored.

Conformance requirements for authors are really just a way to try to help 
authors avoid making what they would consider mistakes. The specification 
actually has a whole section that explains why we bother to have them:

   http://whatwg.org/html#conformance-requirements-for-authors

> Allowing user agents to stop parsing after a parse error (BTW, where 
> exactly does the WHATWG HTML Living Standard allow that?)

It's in the sentence that follows the one that defines "parse error":

   http://whatwg.org/html#parse-error

> is really just avoidance.

Not sure what you mean by "avoidance". What does it avoid?

> If browsers actually apply some specific error recovery, what’s the 
> excuse for not making that mandatory?

We allow these two implementation strategies because not all tools 
actually need to recover. For example, an HTML publishing pipeline might 
want to assume that its input is valid, and simply refuse to handle 
invalid input, rather than applying the error handling rules (which can 
cause a big mess, e.g. reordering content!).

> Different user agents can really do very different things. But I don’t 
> think it’s a good idea to make that a rule of *parsing HTML*.

It's not really different things, it's either doing what the spec says, or 
aborting early.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 22 January 2014 21:48:29 UTC