Re: Character encoding errors (detailed review of parsing algorithm)

On Aug 1, 2007, at 08:11, Ian Hickson wrote:

> On Wed, 18 Jul 2007, Henri Sivonen wrote:
>>
>> (This is part of my detailed review of the parsing algorithm.)
>>
>> The spec says:
>>> Bytes or sequences of bytes in the original byte stream that  
>>> could not be
>>> converted to Unicode characters must be converted to U+FFFD  
>>> REPLACEMENT
>>> CHARACTER code points.
>>
>> The spec should probably say explicitly that such byte sequences are
>> parse errors.
>
> They're not parse errors, they're errors at the character encoding  
> layer.
> IMHO that's out of scope for this spec.

OK. (The writers of the XML spec felt differently about the scope of  
their spec, though.)

> In particular I don't think any of
> the text for parse errors need apply to encoding errors, the encoding
> specs should be the ones that make such errors non-conforming. No?

I agree in principle. I guess this is one of the cases where the spec  
is already logically sufficient but having a one-sentence note  
hinting at the consequences of other specs would go a long way  
disambiguating things for the kinds of reading scenarios mentioned in  
http://diveintomark.org/archives/2004/08/16/specs .
(Compare with http://www.w3.org/mid/01575703-3A06-4F51- 
BE27-86A9EBB44C54@iki.fi )

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 1 August 2007 07:24:59 UTC