- From: Asmus Freytag <asmusf@ix.netcom.com>
- Date: Wed, 06 Mar 2002 11:59:29 -0800
- To: "souravm" <souravm@infy.com>, <www-international@w3.org>
At 08:28 AM 3/6/02 +0530, souravm wrote: >Hi All, > >Can anyone, who has worked with Rossette library for handling Unicode >characters, clarify my following doubts ?>Can anyone, who has worked with >Rossette library for handling Unicode >characters, clarify my following doubts ? I asked Tom Emerson, Senior Computational Linguist at BASIS, and he gave me the following answer: --------------------------------------------------------------------- You can send these questions to unicode-support@basistech.com, which could get a faster answer. >1. Rosette library defines a class bt_string for holding 8 bit strings. It >is possible to create a non uncode string from Unicode string using >ExternalEncoding class. The sample code is as follows - > >bt_string sjisHello("\u0065\u23ff", ExternalEncoding::ShiftJISMS); > >In the above code the unicode string (the first arument in the contructor) >will be converted to Shift_JIS. >Now my question is Shift_JIS supports multibytes characters. But bt_string >can support only single byte (8-bit) characters . So in that case how it >works ? In this case you need to think of bt_string as a container for octets, not logical characters. In essence any multi-octet encoding (including UTF-8) can be contained in a bt_string. So, to convert a Unicode string to ShiftJIS, you would use: Char16 my_ucs_2[] = { 0x3053, 0x306B, 0x3061, 0x308F, 0x0000 } bt_string sjisHello(my_ucs_2, ExternalEncoding::ShiftJISMS); Now sjisHello contains the ShiftJIS encoded octents for the four Unicode characters in my_ucs_2. Going the other way, you could use bt_wstring uniHello(sjisHello, ExternalEncoding::ShiftJISMS); >2. Now the bt_string class is different than normal character array of C ? >In both the cases single byte charcaters are supported. Yes, bt_string is different than a regular C character array because there are no (within the limits of your machine) bounds on the size of the string. You can append characters/strings to it and the underlying storage will grow to fit. Internally bt_string is implemented in terms of the C char (or probably unsigned char, though I don't remember right now) type. Hope that helps, -tree -- Tom Emerson Basis Technology Corp. Sr. Computational Linguist http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
Received on Wednesday, 6 March 2002 14:58:55 UTC