Re: some IANA registrations look like repertoires not charsets? from ned.freed@mrochek.com on 2002-08-29 (ietf-charsets@w3.org from July to September 2002)

From: <ned.freed@mrochek.com>
Date: Thu, 29 Aug 2002 11:21:07 -0700 (PDT)
To: Markus Scherer <markus.scherer@jtcsv.com>
Cc: charsets <ietf-charsets@iana.org>
Message-id: <01KLVA2CNOA00001B1@mauve.mrochek.com>

> ned.freed@mrochek.com wrote:

> > Assuming the intent really was to register repetoires seems like a
> > stretch to me.

> I believe that is possible. I am trying to figure out what the intent was. I
> am not saying that we must assume right away that these names are not
> charsets. The reference to ISO 10646 collections and IBM GCSGIDs however
> _suggests_ that these are just repertoires.

And I respectfully suggest that pondering the intent of such registrations is
not a useful way to spend our time.

> > > Without any specified encoding scheme, they would not qualify as
> > > charsets.

> > It isn't particularly relevant to the matter at hand, but the fact of the
> > matter is that a charset doesn't require an encoding scheme. The
> > requirement is instead that there be a mapping from octets to characters.
> > Whether this is implemented by means of a CCS/CES pair or something else
> > is up to the

> An encoding scheme is nothing but an algorithm for going from bytes to
> characters. "a charset doesn't require an encoding scheme" and "there be a
> mapping from octets to characters" are therefore contradictory.

I knew when I started it was a waste of time to point this out. I'll waste
everyone's time with one more response on this and then I promise I'll shut up.

Anyway, a character encoding scheme is a mapping from characters to octets, not
the other way around.

> Without an encoding scheme, there is no way to decode a byte stream.

> > registration. Charsets like iso-2022-jp certainly don't consist of a single
> > CCS/CES pair.

> We all know that a number of charsets combine one CES with multiple CCSes.
> Without that CES you would not have a charset, though. We could argue if there is one CES with sub-CESes or a CES with CEFs (a little like debating ISO/OSI vs. TCP stack), but at the minimum you need that one lowest-level CES to dissect the byte stream into meaningful units.

I repeat: A charset is defined as mapping from octets to characters. This may
be done in a variety of ways, including but not limited to CCS/CES pairs. You
may like the CCS/CES concept, and it is undeniably useful and perhaps even the
preferred method for specifying charsets. But it isn't what a charset is
defined to be.

> It is of course possible that the IANA character-sets list is supposed to
> list not only things that are "charsets" but also CCSes and CEFs and
> repertoires.

No it isn't. It is supposed to list charsets. End of story. This has been
debated at enormous length in the past, it is how the current definition of a
charset was arrived at, and it is not going to be revisited now.

> If so, then please add clarifying text to the top of the list document, and
> appropriate classification to at least non-charset entries.

Not going to happen.

> > More likely it was assumed the encoding was implied by the registration.

> That would be good and valid, and I am trying to ascertain what encoding if 
> any was implied.

And I am saying that this is a waste of time.

> > In any case, past attempts to clean up the registry haven't been
> > successful.
> > And given that actual use of any of this junk is unlikely to exist, it
> > hasn't proved to be sufficiently problematic to force the issue.

> That is a sad statement. It puts a big disclaimer onto the IANA charset list
> that diminishes its value, in my opinion.

Which you're obviously entitled to. I don't agree, and even if I did it doesn't
change the situation any.

    Ned

Received on Thursday, 29 August 2002 14:33:27 UTC