- From: Martin Duerst <duerst@w3.org>
- Date: Fri, 06 Dec 2002 00:23:05 +0900
- To: Marcin Hanclik <mhanclik@poczta.onet.pl>, ned.freed@mrochek.com
- Cc: ietf-charsets@iana.org
Hello Marcin, You mentioned that the WAP spec said UTF-16, mibenum 1000 (the later being ISO-10646-UCS-2). Why not assume that the mibenum was a mistake, and use charset=utf-16 ? There is an RFC for utf-16, which contains very clear and detailled rules about the BOM. For iso-10646-ucs-2, the entry in http://www.iana.org/assignments/character-sets has: Name: ISO-10646-UCS-2 MIBenum: 1000 Source: the 2-octet Basic Multilingual Plane, aka Unicode this needs to specify network byte order: the standard does not specify (it is a 16-bit integer space) Alias: csUnicode It sounds like this is heavily underspecified. There are other registrations that have similar problems. In general, using UTF-16 (or UTF-16BE/UTF-16LE) is much better, because it's up to date, covers the whole range of Unicode, and is very well defined. Regards, Martin. At 16:22 02/12/05 +0100, Marcin Hanclik wrote: >Hi, Ned! > >Thanks a lot for the mail exchange. I have learned a lot. > >I would like to sum it up since I need a conclusion. > >I am trying to incorporate what You and Martin wrote in Your emails. >The situation then looks like that: >I have to send the UCS-2 encoded data. The headers will look like: > >Content-Type: application/x-my-text-subtype; charset="iso-10646-ucs-2" >Content-Transfer-Encoding: BASE 64 > >data > >My question was: >Can the data marked as "iso-10646-ucs-2" contain BOM? > >Your answer was: > > > I don't know if there are specific rules for handling revisions to > > > iso-10646-ucs-2 or not. I suspect not. However, the general rule is that > > > additions to a charset repetertoire are expected and allowed. See RFC >2279 > > > section 3. However, the BOM is something of a special case. > > > .... > > > For material that isn't labelled with a top level content type of text I >don't > > > think the situation is clear, but the intent has always been to allow > > > additions > > > to charsets subsequent to registration. So I think BOM should be >supported in > > > this context. > >Wrong in the whole case is that top level content has text type, wrong is >that WAP/MMS standards have produced a bug in their specs. But we have to >live with them. > >Since Your answer is NOT CLEAR to me (I hope you agree that it can be...) I >have to derive an answer from the above suggestions. >But this is still not what I wanted. I would like to have: >"New standard overrides the old one" > or >"BOM was not defined in ISO10646:1993 and although new versions of ISO10646 >support BOM in UCS-2, data marked as iso-10646-ucs-2 cannot contain BOM" > instead of >"BOM should be supported in this context" > >Is there any ultimate standard body that specified some rule for that case? >Or can you help me further? > >Kind regards, >Marcin > >--------------r-e-k-l-a-m-a----------------- > >Masz dosc placenia prowizji bankowi ? >mBank - zaloz konto >http://epieniadze.onet.pl/mbank
Received on Thursday, 5 December 2002 11:01:39 UTC