RE: Unicode type

Hi Stephen,

Oops - I knew I got that wrong...

If the leading BOM (byte-order-mark) is present, RFC 2781
REQUIRES the use of the unqualified tag 'UTF-16' and
prohibits the use of the qualified tag 'UTF-16LE' (i.e.,
little-endian).  

This because the insertion of a BOM (byte order mark)
at the beginning of a block of Unicode/ISO-10646 text
in UTF-16 encoding has pernicious side effects.  It is
always unsafe to concatenate two strings that both
begin with a BOM (properly a Zero Width Non-Breaking
Space), because the resulting infixed BOM MUST be
interpreted as a ZWNBSP (not the desired result).

Cheers,
- Ira McDonald, consulting architect at Sharp and Xerox
  High North Inc


-----Original Message-----
From: McDonald, Ira [mailto:imcdonald@sharplabs.com]
Sent: Saturday, March 10, 2001 12:11 PM
To: 'Lee Collins'; 'Stephen Cronin'; www-international@w3.org
Subject: RE: Unicode type


Hi,

Which is properly tagged 'UTF-16LE' (see RFC 2781, February 2000),
when conveyed in IETF or W3C standards.

Cheers,
- Ira McDonald, consulting architect at Sharp and Xerox
  High North Inc


-----Original Message-----
From: Lee Collins [mailto:LCollins@ariba.com]
Sent: Friday, March 09, 2001 9:54 AM
To: 'Stephen Cronin'; www-international@w3.org
Subject: RE: Unicode type


"UnicodeLittle" (UCS2, little endian), with BOM == 0xFFFE

Lee

-----Original Message-----
From: Stephen Cronin [mailto:Stephen.Cronin@symantec.com]
Sent: Thursday, March 08, 2001 11:48 PM
To: www-international@w3.org; www-international-request@w3.org
Subject: Unicode type



Quick question


If I save a simple test file as Unicode on Windows NT 4 can anyone tell me
the
Unicode type/ format it's saved with and it's BOM.


Cheers

Stephen

Received on Saturday, 10 March 2001 21:43:02 UTC