W3C home > Mailing lists > Public > ietf-charsets@w3.org > January to March 2002

Re: Definition of charset "macintosh"

From: Deborah Goldsmith <goldsmit@apple.com>
Date: Mon, 14 Jan 2002 12:40:11 -0800
To: Mark Davis <mark@macchiato.com>
Cc: Harald Tveit Alvestrand <harald@alvestrand.no>, IETF Charsets Mailing List <ietf-charsets@iana.org>
Message-id: <E862FB14-092E-11D6-B7FD-000A27DA5C92@apple.com>
While all of these problems can potentially occur, do we in fact think 
that any of them will actually occur? If field practice is that people 
use the label "macintosh" to label text that comes from a machine 
running Mac OS, then in fact such text has been assuming the Euro for 
that code point since the Mac OS 8.5 era.

I'm guessing since this change was made over two years ago that no one 
has run into severe problems due to the mismatch between the RFC and 
what Mac OS actually uses that code point for. It seems to me like 
defining a new "macintosh-1999" character set might be going a little 
overboard. However, I'd like to continue to gather input.

Deborah Goldsmith
Manager, Fonts & Language Kits
Apple Computer, Inc.
goldsmith@apple.com

On Monday, January 14, 2002, at 08:54 AM, Mark Davis wrote:

> Whenever a character in a charset changes, that can cause data to be
> corrupted. It is especially important nowadays, when the internal 
> character
> set is Unicode/10646, and XML (or HTML) are used to serialize the text 
> in a
> different character set. Here is what happens.
>
> 1. An implementation with the old definition emitting data marked as
> "Macintosh" will escape a Euro sign (€) as &#x20AC; while leaving the
> currency sign (¤) alone. An implementation with the new definition 
> receiving
> that data will correctly handle the Euro, but misinterpret the currency 
> sign
> as a Euro.
>
> - While the currency sign is little used (and was badly conceived in the
> first place), it is used. For example, both in Windows and on Java it is
> used as a stand-in for the currency sign in a currency pattern string.
> Changing that to Euro would cause even apparently unrelated currency 
> values
> such as dollars to appear as Euros.
>
> 2. An implementation with the new definition emitting data marked as
> "Macintosh" will escape a currency sign (¤) as &#xA4; while leaving the 
> Euro
> sign (€)alone. An implementation with the old definition receiving that 
> data
> will correctly handle the currency sign, but misinterpret the Euro as a
> currency sign.
>
> - This is even more serious. All new data with Euros will be 
> misinterpreted
> on older implementations.
>
> To sum it up, changing any character in a set can be dangerous. The 
> best way
> to avoid these situations is:
>
> A. Define fully qualified names for all versions of character sets. The 
> TR22
> naming conventions are strongly recommended
> (http://www.unicode.org/unicode/reports/tr22/). Encourage 
> implementations to
> use the fully-qualified names.
>
> B. One can also have a partially-qualified name (e.g. "Macintosh") as an
> alias for one of these. And that alias could change over time to be the
> latest version. Implementers can also use the partially-qualified 
> character
> set names in circumstances where robust data conversion is not as 
> important.
>
> Mark
> —————
>
> Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ
> [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]
>
> http://www.macchiato.com
>
> ----- Original Message -----
> From: "Harald Tveit Alvestrand" <harald@alvestrand.no>
> To: "Deborah Goldsmith" <goldsmit@apple.com>; "IETF Charsets Mailing 
> List"
> <ietf-charsets@iana.org>
> Sent: Monday, January 14, 2002 06:40
> Subject: Re: Definition of charset "macintosh"
>
>
>> nobody seems to have commented on this....
>>
>> if "macintosh" is used in the industry to refer to a charset that has 
>> the
>> euro sign in it, then I, personally, think that we should update the
>> registration to point out that fact.
>>
>> In a more rational world, a new "macintosh-euro" charset would be
>> registered, but the currency symbol is the single most useless 
>> character I
>> know about - redefining its codepoint does not cause a great deal of 
>> harm
>> to the world.
>>
>> What do others think?
>>
>>             Harald
>>
>> --On 14. desember 2001 11:17 -0800 Deborah Goldsmith 
>> <goldsmit@apple.com>
>> wrote:
>>
>>> The IANA registration for the charset "macintosh", which represents 
>>> the
>>> Mac OS Roman character set, currently refers to RFC 1345.
>>>
>>> Since RFC 1345 was published, the definition of the MacRoman character
>>> set has changed. In particular, the code point 0xDB, which was 
>>> formerly
>>> U+00A4 CURRENCY SIGN, was redefined to be U+20AC EURO SIGN.
>>>
>>> What would be the appropriate course of action to deal with this
>>> discrepancy? Registering a new "macintosh-euro" character set seems 
>>> like
>>> overkill. Apple would prefer to just redefine the IANA-registered
>>> character set "macintosh" to conform to the new definition of 
>>> MacRoman.
>>> Is that allowed? If so, what procedure should be followed?
>>>
>>> The definition of MacRoman can be found at:
>>>
>>> http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT
>>>
>>> Would it be appropriate to refer to that rather than to a (revised) 
>>> RFC?
>>>
>>> Thanks,
>>>
>>>
>>>
>>>
>>
>>
>>
>
Received on Monday, 14 January 2002 15:42:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:52 GMT