Re: Fwd: I-D ACTION:draft-hoffman-utf16-03.txt from MURATA Makoto on 1999-05-06 (ietf-charsets@w3.org from April to June 1999)

From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
Date: Thu, 06 May 1999 16:47:35 +0900
To: ietf-charsets@iana.org
Message-id: <199905060747.AA00397@archlute.apsdc.ksp.fujixerox.co.jp>

Paul Hoffman / IMC wrote:
> However, please do read the draft and 
> let us know if you think it is ready (or almost ready) to be sent off for 
> RFChood.

I think it is almost ready, but I have a nit.

>4.3 Interpreting text labelled as UTF-16
>
>Text labelled with the "UTF-16" charset might be serialized in either
>big-endian or little-endian order. If the first two octets of the text
>is 0xFE followed by 0xFF, then the text can be interpreted as being
>big-endian. If the first two octets of the text is 0xFF followed by
>0xFE, then the text can be interpreted as being little-endian. ...

I think that leading 0xFE 0xFF or 0xFF 0xFE in this case (charset = "utf-16") is 
always a byte order mark and is not a zero-width non-break space.  I would like 
to make this explicit, since "the character 0xFEFF in the first
position of a stream MAY be interpreted as a zero-width non-breaking
space, and is not always a byte-order mark." (in 3.2).

Cheers,

Makoto

Fuji Xerox Information Systems

Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp

Received on Thursday, 6 May 1999 03:49:06 UTC