W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Re: Character encoding errors (detailed review of parsing algorithm)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 1 Aug 2007 10:24:37 +0300
Message-Id: <AF2C07B5-8FC4-4D50-9E91-76A8F71DED34@iki.fi>
Cc: "public-html@w3.org WG" <public-html@w3.org>
To: Ian Hickson <ian@hixie.ch>

On Aug 1, 2007, at 08:11, Ian Hickson wrote:

> On Wed, 18 Jul 2007, Henri Sivonen wrote:
>>
>> (This is part of my detailed review of the parsing algorithm.)
>>
>> The spec says:
>>> Bytes or sequences of bytes in the original byte stream that  
>>> could not be
>>> converted to Unicode characters must be converted to U+FFFD  
>>> REPLACEMENT
>>> CHARACTER code points.
>>
>> The spec should probably say explicitly that such byte sequences are
>> parse errors.
>
> They're not parse errors, they're errors at the character encoding  
> layer.
> IMHO that's out of scope for this spec.

OK. (The writers of the XML spec felt differently about the scope of  
their spec, though.)

> In particular I don't think any of
> the text for parse errors need apply to encoding errors, the encoding
> specs should be the ones that make such errors non-conforming. No?

I agree in principle. I guess this is one of the cases where the spec  
is already logically sufficient but having a one-sentence note  
hinting at the consequences of other specs would go a long way  
disambiguating things for the kinds of reading scenarios mentioned in  
http://diveintomark.org/archives/2004/08/16/specs .
(Compare with http://www.w3.org/mid/01575703-3A06-4F51- 
BE27-86A9EBB44C54@iki.fi )

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 1 August 2007 07:24:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:03 GMT