- From: Uma Umamaheswaran <umavs@ca.ibm.com>
- Date: Thu, 22 Jan 2004 13:10:25 -0500
- To: phoffman@imc.org, ietf-charsets@iana.org
I am surprised at the responses posted. I went back and rechecked all the postings on the previous go around on the posting in July-Sept 2002 time frame. I do not accept that the disposition in the previous go around was conclusive. Statments that it is bad idea, it is unnecessary and possibly harmful .. have been made. There were also opposition to aliases for open use charsets such as 8859-1. Not so strong against the limited use charsets. No - there are no new additions in the list of proposed aliases. I had also raised questions about what is the basis for such claims based on the charsets registration procedure or registry at that time. In the recent go around: ---------------------- Paul Hoffman responded: "It *did* get somewhere: it got heavily discussed. The gist of the discussion was that the request was both unnecessary and possibly harmful to the established base of software because that base would need to be updated with the new aliases." My response -- we will not be making the request if it was not considered necessary by the requesting community of developers (on behalf of the users). If I thought the disposition got to its end -- I would not have stated that 'it got nowhere'. "Possibly Harmful to established software .. " - I claim that the above statement is certainly not based on the current set of Registration procedures and the purpose of the registry itself. I have quoted the relevant sections from the procedure RFC and the registry below and how I interpet the statements therein. Ned Freed had responded: "The last time this request was posted there was considerably pushback saying that adding these additional aliases was a bad idea. I continue to believe this is the case, and I am therefore opposed to making this change." My response: the last time also I asked for the rationale .. It is based on the wrong premise that everyone has to implement all the charset ids and their aliases from the registry. It goes against what I have indicated in the following paragraphs as to how I read the purpose of the registration and the registry itself. I was looking for 'any potential basis' for statements such as what Paul and Ned have made in the current published set of procedure for Charset Registration and in the Registry itself. The following are some relevant statements in the current procedure RFC 2978 and in the published registry - and I have attached how I read these statements in context of the current proposal and the comments that have been made against the proposal. ================ From the Character Set Registry .. =========== ================= http://www.iana.org/assignments/character-sets =============== The very first sentence reads: "These are the official names for character sets that may be used in the Internet and may be referred to in Internet documentation." My interpretation of the above: There is no requirement on any internet or other software that they must support every character set registered in this registry. Nor is there a requirement that they must support every alias of any charset that is registered herein. I cannot see how having aliases or adding more aliases will cause harm to any existing software - certainly not based on what is stated in the registry itself. ====== From: RFC 2978 - IANA Charset Registration Procedures =========== ftp://ftp.rfc-editor.org/in-notes/rfc2978.txt (From Abstract section): "Note: The charset registration procedure exists solely to associate a specific name or names with a given charset and to give an indication of whether or not a given charset can be used in MIME text objects. In particular, the general applicability and appropriateness of a given registered charset to a particular application is a protocol issue, not a registration issue, and is not dealt with by this registration procedure." My interpretation: The second sentence in the above note is a strong evidence for me that comments on the proposal that the request is harmful etc. is not based on this set of procedures. It is some protocol that may say what to use or what not to use .. not the registration itself. Such consideration is NOT a registration issue either. >From section 2.3. Naming Requirements "One or more names MUST be assigned to all registered charsets. Multiple names for the same charset are permitted, but if multiple names are assigned a single primary name for the charset MUST be identified. All other names are considered to be aliases for the primary name and use of the primary name is preferred over use of any of the aliases." My interpretation: Identifying more existing aliases is not against anything that is stated here. The preferred name will be always the Assigned Name. I dont read that above as having any or more than one alias recorded in the registry is somehow harmful. On the other had, I think recording aliases that may be encountered is more informative than not recording them. >From section 2.5. Usage and Implementation Requirements "Use of a large number of charsets in a given protocol may hamper interoperability. However, the use of a large number of undocumented and/or unlabeled charsets hampers interoperability even more." My interpretation: The second sentence is more of a strong argument to open up the registry for more things than being restrictive. The claim of 'harm', 'unnecessary' etc. are certainly not defendable based on the above paras in the registration procedure document. "A charset should therefore be registered ONLY if it adds significant functionality that is valuable to a large community, OR if it documents existing practice in a large community. Note that charsets registered for the second reason should be explicitly marked as being of limited or specialized use and should only be used in Internet messages with prior bilateral agreement." My interpretation: The request was to document existing practice in products supporting a large community of users of IBM systems and non-IBM systems interfacing with these, using Open Standard protocols / specifications that call for use of charsets from the IANA charsets registry. >From section 2.6. Publication Requirements "The registration of a charset does not imply endorsement, approval, or recommendation by the IANA, IESG, or IETF, or even certification that the specification is adequate. " My interpretation: The above statement seems to be saying that this registry is merely a record of what is out there. There is no expectation nor a requirement that any of the charsets or their aliases are implemented by every component attached to the internet. it is a recrod of -- when you encounter one of these charset labels where you can get more information about the definition behind that label. ------------------------------- Just in case some of the rationale for the request is not clear from the earlier set of discussions in July/Sept 2002. IBM has a large set of character encodings registered in its corporate registry with numbers being assigned to them. Most of these are IBM defined -- however, non-IBM sets are also given a number within this registration system. When literal strings are needed as charset labels, often IBM- is added to the number to get IBM-xxxxx as the literal string label. These are used to identify the charsets associated with data in database, in identifying the converters to be invoked etc. and of course using XML as well. XML has recommended that charsets are registered with IANA registry. In the set that is in the proposal all of them are Aliases for existing charsets with assigned names. Others such as IBM-1047 have been dealt with separately. The proposal document has some references showing where these IBM-xxxxx ARE used. It is a matter for the protocols such as XML and other Internet protocols to permit, reject, restrict be open etc. about any of the labels that are registered in the IANA character set registry. Having something in the registry -- I cannot see being HARMFUL to any piece of software / internet component out there. On the other hand, having the information in the registry is more useful, in case any one's current software chooses to enhance itself to recognize some of that data. Otherwise it will remain as another uncrecognized label. Another factor that is driving this request is also the specifcation that 'identity matching of the charset labels (ignoring case)' is required. ----------------- Best regards, Uma. V.S. UMAmaheswaran, Ph.D. Globalization Centre of Competency, IBM Toronto Lab A2/979, 8200 Warden Avenue, Markham, ON, Canada, L6G1C7; +1 905 413 3474; Fax:905 413 4682; TieLine 969; email: umavs@ca.ibm.com
Received on Thursday, 22 January 2004 13:12:25 UTC