- From: McDonald, Ira <imcdonald@sharplabs.com>
- Date: Sat, 10 Mar 2001 18:40:28 -0800
- To: "McDonald, Ira" <imcdonald@sharplabs.com>, "'Lee Collins'" <LCollins@ariba.com>, "'Stephen Cronin'" <Stephen.Cronin@symantec.com>, www-international@w3.org
Hi Stephen, Oops - I knew I got that wrong... If the leading BOM (byte-order-mark) is present, RFC 2781 REQUIRES the use of the unqualified tag 'UTF-16' and prohibits the use of the qualified tag 'UTF-16LE' (i.e., little-endian). This because the insertion of a BOM (byte order mark) at the beginning of a block of Unicode/ISO-10646 text in UTF-16 encoding has pernicious side effects. It is always unsafe to concatenate two strings that both begin with a BOM (properly a Zero Width Non-Breaking Space), because the resulting infixed BOM MUST be interpreted as a ZWNBSP (not the desired result). Cheers, - Ira McDonald, consulting architect at Sharp and Xerox High North Inc -----Original Message----- From: McDonald, Ira [mailto:imcdonald@sharplabs.com] Sent: Saturday, March 10, 2001 12:11 PM To: 'Lee Collins'; 'Stephen Cronin'; www-international@w3.org Subject: RE: Unicode type Hi, Which is properly tagged 'UTF-16LE' (see RFC 2781, February 2000), when conveyed in IETF or W3C standards. Cheers, - Ira McDonald, consulting architect at Sharp and Xerox High North Inc -----Original Message----- From: Lee Collins [mailto:LCollins@ariba.com] Sent: Friday, March 09, 2001 9:54 AM To: 'Stephen Cronin'; www-international@w3.org Subject: RE: Unicode type "UnicodeLittle" (UCS2, little endian), with BOM == 0xFFFE Lee -----Original Message----- From: Stephen Cronin [mailto:Stephen.Cronin@symantec.com] Sent: Thursday, March 08, 2001 11:48 PM To: www-international@w3.org; www-international-request@w3.org Subject: Unicode type Quick question If I save a simple test file as Unicode on Windows NT 4 can anyone tell me the Unicode type/ format it's saved with and it's BOM. Cheers Stephen
Received on Saturday, 10 March 2001 21:43:02 UTC