RE: Revised proposal for UTF-16 from Harald Alvestrand on 1998-05-26 (ietf-charsets@w3.org from April to June 1998)

From: Harald Alvestrand <Harald.Alvestrand@maxware.no>
Date: Tue, 26 May 1998 08:36:50 +0200
To: Dan Kegel <dank@alumni.caltech.edu>, Larry Masinter <masinter@parc.xerox.com>
Cc: ietf-charsets@ISI.EDU
Message-id: <3.0.2.32.19980526083650.009f6a10@127.0.0.1>

At 18:26 25.05.98 -0700, Dan Kegel wrote:
>The underlying standard has the BOM.  
>The authors of that standard knew the issue was
>a hot potato, and decided to go both ways.

And they chose to be wishy-washy about it. Bad Move.
I haven't checked UNICODE, but 10646 is truly wishy-washy; all I could
find about byte order is this little paragraph from annex F:

>If an application which uses one of these signatures recognises its coded  
>representation in reverse sequence (e.g. hexadecimal FFFE), the application 
>can  identify that the coded representations of the following characters use 
>the  opposite octet sequence to the sequence expected, and may take the 
>necessary action to recognise the characters correctly.

Question: For what data element size do we expect the BOM to be used?
For long pieces of text, it's pretty obvious.
But what about databases? Structured values? ASN.1 SET OFs?
On all strings, the first string (whatever that means) or no string?

I'm not worried about wasting space, but about clarity on when to use it.

                                 Harald A

-- 
Harald Tveit Alvestrand, Maxware, Norway
Harald.Alvestrand@maxware.no

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Monday, 25 May 1998 23:43:03 UTC