- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 06 Nov 2003 21:25:44 -0500
- To: Markus Scherer <markus.scherer@jtcsv.com>, charsets <ietf-charsets@iana.org>
Hello Markus, I think these are interesting questions, but strictly speaking, charset definitions are from bytes to characters, i.e. from the legacy encoding to Unicode, and not necessarily back. For the same charset, there may be different reverse mappings, with or without fallbacks, or with different substitution characters. And either way, these will always be hopelessly inadequate in that they loose most information. If fallbacks,... should be made part of the registration, then I guess the RFC defining the registration details should be updated. Regards, Martin. At 09:37 03/11/06 -0800, Markus Scherer wrote: >jean-frederic clere wrote: >>For OSD_EBCDIC_DF04_1 and OSD_EBCDIC_DF04_15 that is a 8 bits roundtrip >>mapping but for 2 bytes mapping, undefined characters are mapped as '?' 0x6F. > >Ok, so the substitution character for these tables is 0x6F. > >On the question of roundtrips, I think we are not communicating properly >due to a mismatch in terminology. > >I believe what you are saying is that these _tables_ as a whole perform a >roundtrip of their repertoire between a BS2000-EBCDIC codepage and the >Unicode portion corresponding to the equivalent ISO 8859 codepage. In >other words, the tables map between exactly N codes on the EBCDIC-based >and N Unicode code points. (N being the same on both sides, N=256 for SBCS >and N=128 for IRV.) Is this correct? > >Then, for every Unicode character _outside_ of this repertoire, there is >no mapping, and the default behavior is to use 0x6F as the substitution >character. > >What I was trying to ask was whether the individual _mappings_ in the >tables (each line in the text table listing) were roundtrip mappings. This >means that when you write something like >0xFC 0x00DC #LATIN CAPITAL LETTER U WITH DIAERESIS >that means that you map Unicode U+00DC to 0xFC while converting from >Unicode to this charset, and you map 0xFC to Unicode U+00DC while >converting from the charset to Unicode. Fallback mappings only go one way. >Since many conversion implementations have fallback mappings in addition >to roundtrip mappings, they should be published, and should be marked >properly. See Unicode TR 22 for details. > >If the tables in your registration requests are pure remappings as >described above, then of course each mapping is a roundtrip mapping. > >Is this how the converter implementation works on BS2000? Is it true that >BS2000 converters do not perform any fallback (one-way) mappings? > >Best regards, >markus
Received on Thursday, 6 November 2003 22:45:34 UTC