- From: Anne van Kesteren <annevk@opera.com>
- Date: Wed, 09 Apr 2008 14:04:45 +0200
- To: Øistein E. Andersen <html5@xn--istein-9xa.com>, public-html@w3.org
On Mon, 07 Apr 2008 19:49:50 +0200, Øistein E. Andersen <html5@øistein.com> wrote: > Unicode 5.1 properly defines ill-formed subsequences and makes > it clear(er) that these shall never impede correct interpretation > of adjacent, well-formed UTF-8 byte sequences. > > Unfortunately, however, no guidance is given as to how many > replacement characters should be emitted for a multi-byte > ill-formed subsequence (not even that the number should not > exceed the number of bytes, but this is clearly intended). > I do realise, of course, that it may be problematic to make > this a conformance criterion, but it might be useful if a > future version of the standard could at least provide a > suggestion for new implementations. Isn't this comment better aimed at the Unicode guys? I agree that it would be ideal if for input 'charset' and 'byte stream', output 'character stream' is always identical regardless of what implementation you pick, but the specification does not seem to be developed with that in mind. -- Anne van Kesteren <http://annevankesteren.nl/> <http://www.opera.com/>
Received on Wednesday, 9 April 2008 12:04:54 UTC