- From: Mark Davis <mark.davis@us.ibm.com>
- Date: Thu, 29 Aug 2002 14:56:14 -0700
- To: ned.freed@mrochek.com
- Cc: charsets <ietf-charsets@iana.org>, Markus Scherer <markus.scherer@jtcsv.com>
> "there be a mapping from octets to characters". I fail to understand this response. I don't mean this rhetorically; there is obviously history behind this that I am unaware of. Logically a CES is a mapping between octets and characters (both directions); when you are generating data you are logically mapping one way; when you are interpreting you are mapping the other. Whenever you have a mapping from some set of sequences of octets to characters, then you can also derive a mapping from some subset of characters to a set of sequences of octets, and vice versa. The only odd cases are when the original mapping takes two sequences of octets to the same character (or takes two characters to different sequences of octets); when you derive the reverse mapping you have to decide which is the preferred mapping and which is just a fallback. Mark ___ mark.davis@us.ibm.com IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 ned. freed@mrochek.com To: Markus Scherer <markus.scherer@jtcsv.com> cc: charsets <ietf-charsets@iana.org> 2002.08.29 11:21 Subject: Re: some IANA registrations look like repertoires not charsets? > ned.freed@mrochek.com wrote: > > Assuming the intent really was to register repetoires seems like a > > stretch to me. > I believe that is possible. I am trying to figure out what the intent was. I > am not saying that we must assume right away that these names are not > charsets. The reference to ISO 10646 collections and IBM GCSGIDs however > _suggests_ that these are just repertoires. And I respectfully suggest that pondering the intent of such registrations is not a useful way to spend our time. > > > Without any specified encoding scheme, they would not qualify as > > > charsets. > > It isn't particularly relevant to the matter at hand, but the fact of the > > matter is that a charset doesn't require an encoding scheme. The > > requirement is instead that there be a mapping from octets to characters. > > Whether this is implemented by means of a CCS/CES pair or something else > > is up to the > An encoding scheme is nothing but an algorithm for going from bytes to > characters. "a charset doesn't require an encoding scheme" and "there be a > mapping from octets to characters" are therefore contradictory. I knew when I started it was a waste of time to point this out. I'll waste everyone's time with one more response on this and then I promise I'll shut up. Anyway, a character encoding scheme is a mapping from characters to octets, not the other way around. > Without an encoding scheme, there is no way to decode a byte stream. > > registration. Charsets like iso-2022-jp certainly don't consist of a single > > CCS/CES pair. > We all know that a number of charsets combine one CES with multiple CCSes. > Without that CES you would not have a charset, though. We could argue if there is one CES with sub-CESes or a CES with CEFs (a little like debating ISO/OSI vs. TCP stack), but at the minimum you need that one lowest-level CES to dissect the byte stream into meaningful units. I repeat: A charset is defined as mapping from octets to characters. This may be done in a variety of ways, including but not limited to CCS/CES pairs. You may like the CCS/CES concept, and it is undeniably useful and perhaps even the preferred method for specifying charsets. But it isn't what a charset is defined to be. > It is of course possible that the IANA character-sets list is supposed to > list not only things that are "charsets" but also CCSes and CEFs and > repertoires. No it isn't. It is supposed to list charsets. End of story. This has been debated at enormous length in the past, it is how the current definition of a charset was arrived at, and it is not going to be revisited now. > If so, then please add clarifying text to the top of the list document, and > appropriate classification to at least non-charset entries. Not going to happen. > > More likely it was assumed the encoding was implied by the registration. > That would be good and valid, and I am trying to ascertain what encoding if > any was implied. And I am saying that this is a waste of time. > > In any case, past attempts to clean up the registry haven't been > > successful. > > And given that actual use of any of this junk is unlikely to exist, it > > hasn't proved to be sufficiently problematic to force the issue. > That is a sad statement. It puts a big disclaimer onto the IANA charset list > that diminishes its value, in my opinion. Which you're obviously entitled to. I don't agree, and even if I did it doesn't change the situation any. Ned
Received on Thursday, 29 August 2002 17:58:12 UTC