- From: Mark Davis <mark.davis@us.ibm.com>
- Date: Tue, 06 Aug 2002 13:29:25 -0700
- To: ned.freed@mrochek.com
- Cc: Chris.Newman@Sun.COM, ietf-charsets@iana.org, Uma Umamaheswaran <umavs@ca.ibm.com>
- Message-id: <OF77A2DE19.9854EF1C-ON88256C0D.0068B6F4@us.ibm.com>
For better or worse, the IANA registry is used as a central repository of names for character set mappings. In particular, the XML Standard (http: //www.w3.org/TR/REC-xml) is driving the registration of many encodings: 4.3.3 Character Encoding in Entities ... It is recommended that character encodings registered (as charsets) with the Internet Assigned Numbers Authority [IANA-CHARSETS], other than those just listed, be referred to using their registered names; other encodings should use names starting with an "x-" prefix. XML processors should match character encoding names in a case-insensitive way and should either interpret an IANA-registered name as the encoding registered at IANA for that name or treat it as unknown (processors are, of course, not required to support all IANA-registered encodings). ... The IANA registry is thus serving the very important function of cross- correlating the different terms for charsets used in a great many different functions. On the principle of lenient acceptance, additional aliases should be allowed. Of course, the recommended names should be strongly preferred, in whatever is output. Mark ___ mark.davis@us.ibm.com IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 ned. freed@mrochek.com To: Uma Umamaheswaran/Toronto/IBM@IBMCA cc: Chris.Newman@Sun.COM, ietf-charsets@iana.org 2002.08.06 09:52 Subject: Re: Proposal for additional Aliases to IANA registry of character sets > Chris: > As far as I know, the IANA registered names are also used for INTRANET > using IETF protocols. That's perhaps true but beside the point. We're dealing with a parameter namespace for _Internet_ protocols here. > The IBM corporate registration (started long before the IETF age) has been > including in its numbering scheme coded character sets from many different > sources -- including several of the ISO 7-bit, 8-bit standards, the many > ISO-2022 scheme based national standards, several non-IBM-vendor defined > etc.etc. These numbers become aliases in the form IBM-nnnnn etc. > If we state that the "the primary names assigned in the IANA registry is > the name that is 'Strongly Encouraged to be used" for OPEN interchange > (when the charset is not constrained in any manner)" then use of strings of > ISO-8859-1 etc. can be promoted widely. In these cases, the ALIASEs are > meant to be used for "limited use contexts". With the printer MIB > numbers, even in the IETF open context there will be multiple 'names'. The problem is that the present registry doesn't support making such distinctions. The current intent is that the primary name should be used but all aliases should be recognized by all implementations. As such, adding new aliases to an existing and widely used charset means updating a very large installed base. Products that support on-the-fly updating of charset tables are the exception, not the rule. This makes such changes potentially VERY disruptive. Nor would making the distinction you propose go far enough IMO. To be truly effective there would need to be a way of listing a set of aliases for a given charset that cannot conflict with other names and aliases yet MUST NOT be used on the Internet. > Unfortunately any ALIAS has a tendency to leak and they have to be > equivalenced by implementations expecting to respect the aliases. Short of > BANNING aliases this cannot be avoided. By acting to add such aliases to the general list we are basically saying that implementations done in good faith in accordance with the standards are a fault, whereas implementations that violated the standards are not. I strongly object. This is no way to run a railroad. Now, I wouldn't be happy but 'd perhaps reach a different conclusion given evidence of widespread use of an unregistered alias for a charset on the Internet. But you yourself have stated that the issue here is use in limited contexts. > Also as Markus has stated singling out the 8859-1 related IBMxxxx is not > justified, neither deleting the existing aliases for many of the ISO > standards, without impacting many existing implementations. You have it exactly backwards. Adding additional aliases to commonly used charsets is an act that singles out compliant implementations. Refusing to add them singles out incompliant implementations only. And absent any indication such implementations are commonplace, I have absolutely no problem with that. Ned
Attachments
- image/gif attachment: graycol.gif
- image/gif attachment: ecblank.gif
- image/gif attachment: pic15459.gif
Received on Tuesday, 6 August 2002 16:30:20 UTC