Re: Character encoding errors (detailed review of parsing algorithm) from Henri Sivonen on 2007-08-01 (public-html@w3.org from August 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 1 Aug 2007 10:24:37 +0300
To: Ian Hickson <ian@hixie.ch>
Cc: "public-html@w3.org WG" <public-html@w3.org>
Message-Id: <AF2C07B5-8FC4-4D50-9E91-76A8F71DED34@iki.fi>

On Aug 1, 2007, at 08:11, Ian Hickson wrote:

> On Wed, 18 Jul 2007, Henri Sivonen wrote:
>>
>> (This is part of my detailed review of the parsing algorithm.)
>>
>> The spec says:
>>> Bytes or sequences of bytes in the original byte stream that  
>>> could not be
>>> converted to Unicode characters must be converted to U+FFFD  
>>> REPLACEMENT
>>> CHARACTER code points.
>>
>> The spec should probably say explicitly that such byte sequences are
>> parse errors.
>
> They're not parse errors, they're errors at the character encoding  
> layer.
> IMHO that's out of scope for this spec.

OK. (The writers of the XML spec felt differently about the scope of  
their spec, though.)

> In particular I don't think any of
> the text for parse errors need apply to encoding errors, the encoding
> specs should be the ones that make such errors non-conforming. No?

I agree in principle. I guess this is one of the cases where the spec  
is already logically sufficient but having a one-sentence note  
hinting at the consequences of other specs would go a long way  
disambiguating things for the kinds of reading scenarios mentioned in  
http://diveintomark.org/archives/2004/08/16/specs .
(Compare with http://www.w3.org/mid/01575703-3A06-4F51- 
BE27-86A9EBB44C54@iki.fi )

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 1 August 2007 07:24:59 UTC