- From: <ned.freed@mrochek.com>
- Date: Thu, 29 Aug 2002 11:21:07 -0700 (PDT)
- To: Markus Scherer <markus.scherer@jtcsv.com>
- Cc: charsets <ietf-charsets@iana.org>
> ned.freed@mrochek.com wrote: > > Assuming the intent really was to register repetoires seems like a > > stretch to me. > I believe that is possible. I am trying to figure out what the intent was. I > am not saying that we must assume right away that these names are not > charsets. The reference to ISO 10646 collections and IBM GCSGIDs however > _suggests_ that these are just repertoires. And I respectfully suggest that pondering the intent of such registrations is not a useful way to spend our time. > > > Without any specified encoding scheme, they would not qualify as > > > charsets. > > It isn't particularly relevant to the matter at hand, but the fact of the > > matter is that a charset doesn't require an encoding scheme. The > > requirement is instead that there be a mapping from octets to characters. > > Whether this is implemented by means of a CCS/CES pair or something else > > is up to the > An encoding scheme is nothing but an algorithm for going from bytes to > characters. "a charset doesn't require an encoding scheme" and "there be a > mapping from octets to characters" are therefore contradictory. I knew when I started it was a waste of time to point this out. I'll waste everyone's time with one more response on this and then I promise I'll shut up. Anyway, a character encoding scheme is a mapping from characters to octets, not the other way around. > Without an encoding scheme, there is no way to decode a byte stream. > > registration. Charsets like iso-2022-jp certainly don't consist of a single > > CCS/CES pair. > We all know that a number of charsets combine one CES with multiple CCSes. > Without that CES you would not have a charset, though. We could argue if there is one CES with sub-CESes or a CES with CEFs (a little like debating ISO/OSI vs. TCP stack), but at the minimum you need that one lowest-level CES to dissect the byte stream into meaningful units. I repeat: A charset is defined as mapping from octets to characters. This may be done in a variety of ways, including but not limited to CCS/CES pairs. You may like the CCS/CES concept, and it is undeniably useful and perhaps even the preferred method for specifying charsets. But it isn't what a charset is defined to be. > It is of course possible that the IANA character-sets list is supposed to > list not only things that are "charsets" but also CCSes and CEFs and > repertoires. No it isn't. It is supposed to list charsets. End of story. This has been debated at enormous length in the past, it is how the current definition of a charset was arrived at, and it is not going to be revisited now. > If so, then please add clarifying text to the top of the list document, and > appropriate classification to at least non-charset entries. Not going to happen. > > More likely it was assumed the encoding was implied by the registration. > That would be good and valid, and I am trying to ascertain what encoding if > any was implied. And I am saying that this is a waste of time. > > In any case, past attempts to clean up the registry haven't been > > successful. > > And given that actual use of any of this junk is unlikely to exist, it > > hasn't proved to be sufficiently problematic to force the issue. > That is a sad statement. It puts a big disclaimer onto the IANA charset list > that diminishes its value, in my opinion. Which you're obviously entitled to. I don't agree, and even if I did it doesn't change the situation any. Ned
Received on Thursday, 29 August 2002 14:33:27 UTC