Re: Proposal for additional Aliases to IANA registry of character sets

                                                                                                               
                                                                                                               
                                                                                                               


For better or worse, the IANA registry is used as a central repository of
names for character set mappings. In particular, the XML Standard (http:
//www.w3.org/TR/REC-xml) is driving the registration of many encodings:

4.3.3 Character Encoding in Entities
...

It is recommended that character encodings registered (as charsets) with
the Internet Assigned Numbers Authority [IANA-CHARSETS], other than those
just listed, be referred to using their registered names; other encodings
should use names starting with an "x-" prefix. XML processors should match
character encoding names in a case-insensitive way and should either
interpret an IANA-registered name as the encoding registered at IANA for
that name or treat it as unknown (processors are, of course, not required
to support all IANA-registered encodings).
...

The IANA registry is thus serving the very important function of cross-
correlating the different terms for charsets used in a great many different
functions. On the principle of lenient acceptance, additional aliases
should be allowed. Of course, the recommended names should be strongly
preferred, in whatever is output.

Mark
___
mark.davis@us.ibm.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799



                                                                                                                    
                      ned.                                                                                          
                      freed@mrochek.com        To:       Uma Umamaheswaran/Toronto/IBM@IBMCA                        
                                               cc:       Chris.Newman@Sun.COM, ietf-charsets@iana.org               
                      2002.08.06 09:52         Subject:  Re: Proposal for additional Aliases to IANA registry of    
                                                character sets                                                      
                                                                                                                    
                                                                                                                    
                                                                                                                    



> Chris:

> As far as I know, the IANA registered names are also used for INTRANET
> using IETF protocols.

That's perhaps true but beside the point. We're dealing with a parameter
namespace for _Internet_ protocols here.

> The IBM corporate registration (started long before the IETF age) has
been
> including in its numbering scheme coded character sets from many
different
> sources -- including several of the ISO 7-bit, 8-bit standards, the many
> ISO-2022 scheme based national standards, several non-IBM-vendor defined
> etc.etc.   These numbers become aliases in the form IBM-nnnnn etc.

> If we state that the "the primary names assigned in the IANA registry is
> the name that is 'Strongly Encouraged to be used" for OPEN interchange
> (when the charset is not constrained in any manner)" then use of strings
of
> ISO-8859-1 etc. can be promoted widely.  In these cases, the ALIASEs are
> meant to be used for "limited use contexts".   With the printer MIB
> numbers, even in the IETF open context there will be multiple 'names'.

The problem is that the present registry doesn't support making such
distinctions. The current intent is that the primary name should be used
but
all aliases should be recognized by all implementations.

As such, adding new aliases to an existing and widely used charset  means
updating a very large installed base. Products that support on-the-fly
updating
of charset tables are the exception, not the rule.

This makes such changes potentially VERY disruptive.

Nor would making the distinction you propose go far enough IMO. To be truly
effective there would need to be a way of listing a set of aliases for a
given
charset that cannot conflict with other names and aliases yet MUST NOT be
used
on the Internet.

> Unfortunately any ALIAS has a tendency to leak and they have to be
> equivalenced by implementations expecting to respect the aliases.  Short
of
> BANNING aliases this cannot be avoided.

By acting to add such aliases to the general list we are basically saying
that implementations done in good faith in accordance with the standards
are a fault, whereas implementations that violated the standards are not.

I strongly object. This is no way to run a railroad.

Now, I wouldn't be happy but 'd perhaps reach a different conclusion given
evidence of widespread use of an unregistered alias for a charset on the
Internet. But you yourself have stated that the issue here is use in
limited
contexts.

> Also as Markus has stated singling out the 8859-1 related IBMxxxx is not
> justified, neither deleting the existing aliases for many of the ISO
> standards, without impacting many existing implementations.

You have it exactly backwards. Adding additional aliases to commonly used
charsets is an act that singles out compliant implementations. Refusing
to add them singles out incompliant implementations only. And absent
any indication such implementations are commonplace, I have absolutely
no problem with that.

                                                 Ned

Received on Tuesday, 6 August 2002 16:30:20 UTC