Re: Character encoding errors (detailed review of parsing algorithm) from Ian Hickson on 2007-08-01 (public-html@w3.org from August 2007)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 1 Aug 2007 05:11:52 +0000 (UTC)
To: Henri Sivonen <hsivonen@iki.fi>
Cc: "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <Pine.LNX.4.64.0708010509540.9342@dhalsim.dreamhost.com>

On Wed, 18 Jul 2007, Henri Sivonen wrote:
> 
> (This is part of my detailed review of the parsing algorithm.)
> 
> The spec says:
> > Bytes or sequences of bytes in the original byte stream that could not be
> > converted to Unicode characters must be converted to U+FFFD REPLACEMENT
> > CHARACTER code points.
> 
> The spec should probably say explicitly that such byte sequences are 
> parse errors.

They're not parse errors, they're errors at the character encoding layer. 
IMHO that's out of scope for this spec. In particular I don't think any of 
the text for parse errors need apply to encoding errors, the encoding 
specs should be the ones that make such errors non-conforming. No?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 1 August 2007 05:12:18 UTC