- From: Dan Connolly <connolly@w3.org>
- Date: 17 Dec 2002 08:33:13 -0600
- To: www-i18n-comments@w3.org
In another working group, I was just going to cite the definition of a character encoding scheme, but I see it's wrong in the charmod spec: "A CES is a mapping of the code units of a CEF into well-defined sequences of bytes" -- http://www.w3.org/TR/charmod/#sec-Digital http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Digital No, a character encoding scheme maps a squence of characters to a sequence of bytes. This goes back at least as far as HTML 4.0: "The "charset" parameter identifies a character encoding, which is a method of converting a sequence of bytes into a sequence of characters." -- http://www.w3.org/TR/html401/charset.html It's technically arbitrary whether it goes from byte* to character* or the other way around, but the use of 'encoding' in the name strongly suggests encoding characters, i.e. going from characters to bytes. The charmod spec gets the specification of IANA charsets right, indirectly... "A CES, together with the CCSes it is used with, is identified by an IANA charset identifier." but it's not nearly so mathematically precise as just saying that IANA charsets identify character encoding schemes, and character encoding schemes are (invertible) functions from character sequences to byte sequences. Please fix. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Tuesday, 17 December 2002 09:33:18 UTC