Re: Proposal for Addition of New Alias Names to Existing Iana Registered Character Sets - REPOSTING REQUEST

I am surprised at the responses posted.  I went back and rechecked all the
postings on the previous go around on the posting in July-Sept 2002 time
frame.  I do not accept that the disposition in the previous go around was
conclusive.

Statments that it is bad idea, it is unnecessary and possibly harmful ..
have been made.  There were also opposition to aliases for open use
charsets such as 8859-1.   Not so strong against the limited use charsets.

No - there are no new additions in the list of proposed aliases.

I had also raised questions about what is the basis for such claims based
on the charsets registration procedure or registry at that time.

In the recent go around:
----------------------
Paul Hoffman responded:
"It *did* get somewhere: it got heavily discussed. The gist of the
discussion was that the request was both unnecessary and possibly harmful
to the established base of software because that base would need to be
updated with the new aliases."

My response --  we will not be making the request if it was not considered
necessary by the requesting community of developers (on behalf of the
users).   If I thought the disposition got to its end -- I would not have
stated that 'it got nowhere'.   "Possibly Harmful to established software
.. " -  I claim that the above statement is certainly not based on the
current set of Registration procedures and the purpose of the registry
itself.  I have quoted the relevant sections from the procedure RFC and the
registry below and how I interpet the statements therein.

Ned Freed had responded:
"The last time this request was posted there was considerably pushback
saying that adding these additional aliases was a bad idea. I continue to
believe this is the case, and I am therefore opposed to making this
change."

My response:  the last time also I asked for the rationale .. It is based
on the wrong premise that everyone has to implement all the charset ids and
their aliases from the registry.  It goes against what I have indicated in
the following paragraphs as to how I read the purpose of the registration
and the registry itself.

I was looking for 'any potential basis' for statements such as what Paul
and Ned have made in the current published set of procedure for Charset
Registration and in the Registry itself.  The following are some relevant
statements in the current procedure RFC 2978 and in the published registry
- and I have attached how I read these statements in context of the current
proposal and the comments that have been made against the proposal.

================ From the Character Set Registry .. ===========
================= http://www.iana.org/assignments/character-sets
===============

The very first sentence reads:
"These are the official names for character sets that may be used in the
Internet and may be referred to in Internet documentation."

My interpretation of the above:  There is no requirement on any internet or
other software that they must support every character set registered in
this registry.  Nor is there a requirement that they must support every
alias of any charset that is registered herein.   I cannot see how having
aliases or adding more aliases will cause harm to any existing software -
certainly not based on what is stated in the registry itself.

====== From:     RFC 2978 -   IANA Charset Registration Procedures
=========== ftp://ftp.rfc-editor.org/in-notes/rfc2978.txt

(From Abstract section):
"Note: The charset registration procedure exists solely to associate a
specific name or names with a given charset and to give an indication of
whether or not a given charset can be used in MIME text objects.  In
particular, the general applicability and appropriateness of a given
registered charset to a particular application is a protocol issue, not a
registration issue, and is not dealt with by this registration procedure."

My interpretation:  The second sentence in the above note is a strong
evidence for me that comments on the proposal that the request is harmful
etc. is not based on this set of procedures.  It is some protocol that may
say what to use or what not to use .. not the registration itself.   Such
consideration is NOT a registration issue either.

>From section 2.3.  Naming Requirements

"One or more names MUST be assigned to all registered charsets. Multiple
names for the same charset are permitted, but if multiple names are
assigned a single primary name for the charset MUST be identified. All
other names are considered to be aliases for the primary name and use of
the primary name is preferred over use of any of the aliases."

My interpretation:  Identifying more existing aliases is not against
anything that is stated here.  The preferred name will be always the
Assigned Name.  I dont read that above as having any or more than one alias
recorded in the registry is somehow harmful.  On the other had, I think
recording  aliases that may be encountered is more informative than not
recording them.

>From section 2.5.  Usage and Implementation Requirements

"Use of a large number of charsets in a given protocol may hamper
interoperability.  However, the use of a large number of undocumented
and/or unlabeled charsets hampers interoperability even more."

My interpretation:  The second sentence is more of a strong argument to
open up the registry for more things than being restrictive.    The claim
of 'harm', 'unnecessary' etc. are certainly not defendable based on the
above paras in the registration procedure document.

"A charset should therefore be registered ONLY if it adds significant
functionality that is valuable to a large community, OR if it documents
existing practice in a large community.  Note that charsets registered for
the second reason should be explicitly marked as being of limited or
specialized use and should only be used in Internet messages with prior
bilateral agreement."

My interpretation:  The request was to document existing practice in
products supporting a large community of users of IBM  systems and non-IBM
systems interfacing with these, using Open Standard protocols /
specifications that call for use of charsets from the IANA charsets
registry.

>From section 2.6.  Publication Requirements

"The registration of a charset does not imply endorsement, approval, or
recommendation by the IANA, IESG, or IETF, or even certification that the
specification is adequate. "

My interpretation: The above statement seems to be saying that this
registry is merely a record of what is out there.  There is no expectation
nor a requirement that any of the charsets or their aliases are implemented
by every component attached to the internet.  it is a recrod of -- when you
encounter one of these charset labels where you can get  more information
about the definition behind that label.

-------------------------------

Just in case some of the rationale for the request is not clear from the
earlier set of discussions in July/Sept 2002.

IBM has a large set of character encodings registered in its corporate
registry with numbers being assigned to them.  Most of these are IBM
defined -- however, non-IBM sets are also given a number within this
registration system.  When literal strings are needed as charset labels,
often IBM- is added to the number to get IBM-xxxxx as the literal string
label.  These are used to identify the charsets associated with data in
database, in identifying the converters to be invoked etc.  and of  course
using XML as well.  XML has recommended that charsets are registered with
IANA registry.  In the set that is in the proposal all of them are Aliases
for existing charsets with assigned names.  Others such as IBM-1047 have
been dealt with separately.  The proposal document has some references
showing where these IBM-xxxxx ARE used.

It is a matter for the protocols such as XML and other Internet protocols
to permit, reject, restrict be open etc. about any of the labels that are
registered in the IANA character set registry.  Having something in the
registry -- I cannot see being HARMFUL to any piece of software / internet
component out there.  On the other hand, having the information in the
registry is more useful, in case any one's current software chooses to
enhance itself to recognize some of that data.  Otherwise it will remain as
another uncrecognized label.

Another factor that is driving this request is also the specifcation that
'identity matching of the charset labels (ignoring case)'  is required.

-----------------

Best regards,  Uma.
V.S. UMAmaheswaran, Ph.D.
Globalization Centre of Competency, IBM Toronto Lab
A2/979, 8200 Warden Avenue, Markham, ON, Canada, L6G1C7; +1 905 413 3474;
Fax:905 413 4682; TieLine 969; email: umavs@ca.ibm.com

Received on Thursday, 22 January 2004 13:12:25 UTC