[ietf-charsets] Recent charset additions and issues

Several recent additions to the charset registry illustrate a number
of issues.  The specific entries I refer to are:

Name:  Amiga-1251
MIBenum:  2104
Source:  See (http://www.amiga.ultranet.ru/Amiga-1251.html)
Alias:  Ami1251
Alias:  Amiga1251
Alias:  Ami-1251
(Aliases are provided for historical reasons and should not be used)

Name:  KOI7-switched
MIBenum:  2105
Source:  See <http://www.iana.org/assignments/charset-reg/KOI7-switched>
Aliases:  None

Name:  OSD_EBCDIC_DF04_15
MIBenum:  115
Source:  Fujitsu-Siemens standard mainframe EBCDIC encoding
         Please see: <http://www.iana.org/assignments/charset-reg/OSD-EBCDIC-DF04-15>
Alias:   None

Name:  OSD_EBCDIC_DF03_IRV
MIBenum:  116
Source:  Fujitsu-Siemens standard mainframe EBCDIC encoding
         Please see: <http://www.iana.org/assignments/charset-reg/OSD-EBCDIC-DF03-IRV>
Alias:  None

Name:  OSD_EBCDIC_DF04_1
MIBenum:  117
Source:  Fujitsu-Siemens standard mainframe EBCDIC encoding
         Please see: <http://www.iana.org/assignments/charset-reg/OSD-EBCDIC-DF04-1>
Alias:  None

Also relevant is the following excerpt from the registry:

The value space for MIBenum values has been divided into three
regions. The first region (3-999) consists of coded character sets
that have been standardized by some standard setting organization.
This region is intended for standards that do not have subset
implementations. The second region (1000-1999) is for the Unicode and
ISO/IEC 10646 coded character sets together with a specification of a
(set of) sub-repertoires that may occur.  The third region (>1999) is
intended for vendor specific coded character sets.

	Assigned MIB enum Numbers
	-------------------------
	0-2		Reserved
	3-999		Set By Standards Organizations
	1000-1999	Unicode / 10646
	2000-2999	Vendor

One issue is that the MIBenum values assigned to these charsets does not
seem to be consistent with the description above and with the reference
information at the indicated URIs. It appears that the last three are in
fact vendor charsets and therefore should have MIBenum values in the 2000
to 2999 range.  Conversely, it is not clear why KOI7-switched has been
assigned a Vendor MIBenum value, nor which vendor might be responsible.

Another issue is that the three OSD_EBCDIC_DF* charsets give no indication
in the source documents as to whether or not the charsets are suitable for
use with MIME text.  Such an indication is supposed to be part of the
registration (RFC 2978 section 5).  A related issue is the fact that the
registry itself provides no such indication for any charsets, which
is at best highly inconvenient for implementors.

None of the charsets above have been provided with an alias beginning with
"cs" for use with the printer MIB as discussed in section 2.3 of RFC 2978.
If that were consistently done, there would be no charset with a confusing
Alias: None
line in the registry.

How can we minimize these issues in the future?  I believe that use of RFC 2978
(or a successor) as a checklist during the review process would help.  I believe
that the addition to the registration template of a brief history of the
charset origin (originator and affiliation) would help in determining
whether a particular charset is a Vendor charset or Set By [a] Standards
Organization[s].  Finally, inclusion of a "MIME-text" field in the registry
with a yes/no value would not only be a boon to implementors of applications
which use charsets in a MIME context, but would prompt IANA to obtain a
statement of MIME text compatibility if it is lacking in the registration
application.

Received on Thursday, 29 January 2004 18:17:28 UTC