W3C home > Mailing lists > Public > www-international@w3.org > January to March 2001

RE: Unicode type

From: McDonald, Ira <imcdonald@sharplabs.com>
Date: Sat, 10 Mar 2001 18:40:28 -0800
Message-ID: <1115A7CFAC25D311BC4000508B2CA5375ED0B6@mailsrvnt02.enet.sharplabs.com>
To: "McDonald, Ira" <imcdonald@sharplabs.com>, "'Lee Collins'" <LCollins@ariba.com>, "'Stephen Cronin'" <Stephen.Cronin@symantec.com>, www-international@w3.org
Hi Stephen,

Oops - I knew I got that wrong...

If the leading BOM (byte-order-mark) is present, RFC 2781
REQUIRES the use of the unqualified tag 'UTF-16' and
prohibits the use of the qualified tag 'UTF-16LE' (i.e.,
little-endian).  

This because the insertion of a BOM (byte order mark)
at the beginning of a block of Unicode/ISO-10646 text
in UTF-16 encoding has pernicious side effects.  It is
always unsafe to concatenate two strings that both
begin with a BOM (properly a Zero Width Non-Breaking
Space), because the resulting infixed BOM MUST be
interpreted as a ZWNBSP (not the desired result).

Cheers,
- Ira McDonald, consulting architect at Sharp and Xerox
  High North Inc


-----Original Message-----
From: McDonald, Ira [mailto:imcdonald@sharplabs.com]
Sent: Saturday, March 10, 2001 12:11 PM
To: 'Lee Collins'; 'Stephen Cronin'; www-international@w3.org
Subject: RE: Unicode type


Hi,

Which is properly tagged 'UTF-16LE' (see RFC 2781, February 2000),
when conveyed in IETF or W3C standards.

Cheers,
- Ira McDonald, consulting architect at Sharp and Xerox
  High North Inc


-----Original Message-----
From: Lee Collins [mailto:LCollins@ariba.com]
Sent: Friday, March 09, 2001 9:54 AM
To: 'Stephen Cronin'; www-international@w3.org
Subject: RE: Unicode type


"UnicodeLittle" (UCS2, little endian), with BOM == 0xFFFE

Lee

-----Original Message-----
From: Stephen Cronin [mailto:Stephen.Cronin@symantec.com]
Sent: Thursday, March 08, 2001 11:48 PM
To: www-international@w3.org; www-international-request@w3.org
Subject: Unicode type



Quick question


If I save a simple test file as Unicode on Windows NT 4 can anyone tell me
the
Unicode type/ format it's saved with and it's BOM.


Cheers

Stephen
Received on Saturday, 10 March 2001 21:43:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:56 GMT