W3C home > Mailing lists > Public > ietf-charsets@w3.org > July to September 2002

Re: some IANA registrations look like repertoires not charsets?

From: <ned.freed@mrochek.com>
Date: Thu, 29 Aug 2002 18:22:48 -0700 (PDT)
To: Kenneth Whistler <kenw@sybase.com>
Cc: ned.freed@mrochek.com, ietf-charsets@iana.org
Message-id: <01KLVOXV66C00001B1@mauve.mrochek.com>

> I was going to stay out of this, but I am a little troubled by
> the brushoff you appear to be giving to Marcus' concerns.

Please reread my messages; I never said anything of the sort. What I did say is
that I think pondering the intent of ancient junk in the registry is a waste of
time.

> I understand that "cleaning up the IANA charset registry" is
> a blackhole for effort, and has a marginal benefit to effort
> tradeoff, but when an IBM character mapping specialist brings
> to your attention identification of registrations of IBM
> related repertoires that can only be defective as registrations,
> it seems a relatively small task to mark them as such in the
> registry, so that other people don't trip over them.

Then by all meanns request that they be marked as defective. I have no problem
with that whatsoever. It may or may not happen, but that's a different
question. Dross removal has been attempted several times previously, you know.

> I understand the distinction you are making here. The charset registry
> defines labels that allow a protocol to identify a byte stream and
> then, in principle, using whatever mechanism is associated with that
> registration, to decode that byte stream into a sequence of characters.
> Period. It takes no position on how characters are to be mapped into
> octets, or on the generic issues of mapping tables, round-trip mapping,
> and so on.

Exactly. The minute you get into this stuff you open a huge can of worms.

> > I repeat: A charset is defined as mapping from octets to characters.

> The problem, of course, is that for those IBM repertoires, in particular,
> that Marcus pointed out, there can be *no* mapping from octets to
> characters -- it is inherently and completely undefined. These have
> to be defective registrations.

Fine. Then by all means mark them as such and be done with it. What I object to
is spending time trying to figure out what the intent of these was and trying
to fix the registration accordingly.

				Ned
Received on Thursday, 29 August 2002 21:39:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:53 GMT