Re: Fwd: I-D ACTION:draft-hoffman-utf16-03.txt from Martin J. Duerst on 1999-05-06 (ietf-charsets@w3.org from April to June 1999)

From: Martin J. Duerst <duerst@w3.org>
Date: Thu, 06 May 1999 22:00:02 +0900
To: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
Cc: ietf-charsets@iana.org
Message-id: <199905061324.WAA16662@sh.w3.mag.keio.ac.jp>

At 16:47 99/05/06 +0900, MURATA Makoto wrote:

> >4.3 Interpreting text labelled as UTF-16
> >
> >Text labelled with the "UTF-16" charset might be serialized in either
> >big-endian or little-endian order. If the first two octets of the text
> >is 0xFE followed by 0xFF, then the text can be interpreted as being
> >big-endian. If the first two octets of the text is 0xFF followed by
> >0xFE, then the text can be interpreted as being little-endian. ...
> 
> I think that leading 0xFE 0xFF or 0xFF 0xFE in this case (charset = "utf-16") is 
> always a byte order mark and is not a zero-width non-break space.  I would like 
> to make this explicit, since "the character 0xFEFF in the first
> position of a stream MAY be interpreted as a zero-width non-breaking
> space, and is not always a byte-order mark." (in 3.2).

I think it would be nice if we could it make that way, but I'm not
at all sure that we can do that. We can't just change definitions that
were around previously.

Regards,   Martin.


#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org

Received on Thursday, 6 May 1999 09:25:55 UTC