- From: Eric Brunner-Williams <wampum@maine.rr.com>
- Date: Thu, 28 Jun 2001 15:42:07 -0400
- To: Patrik Fältström <paf@cisco.com>, Eric Brunner-Williams in Portland Maine <brunner@nic-naa.net>
- Cc: Eric Brunner-Williams in Portland Maine <brunner@nic-naa.net>, Keith Moore <moore@cs.utk.edu>, discuss@apps.ietf.org, brunner@nic-naa.net
Patrik, So the point of your comment is that you think I need to know this fact. OK. If I didn't know it previously, say at any time since 1988, I know it now. I know I knew it then, the 8859-*/2022 mess was so close to being real I almost had to stuff it or its premature antecedent into X/Open's first UNIX standard. A chararacter != a byte. Now if we were having a discussion about process vs file encodings, as OS implementors (which only one of us is, to my knowledge), then we could segue tothe sub-7-bit processing model of some application(s), and the multi-byte processing model of some (other) application(s), their file encoding requirements, and our (the OS weenies owning the bottom- half-of-the-tty display-width problem, the lib{c,mb,w} APIs, collation orders, etc.) studied indifference to these application-specific ephemera. We could marvel that some apps dorks were imposing a 6-bit-minus-one code-space restriction on an 8-bit organized i/o and memory system [1], or a must-be-stateless octet processing semantic as the sole means of getting a more-than-8-bit code-space on an 8-bit memory type. Having done the ASCII-to-UTF-8 transformation twice in two similar host source bases (solaris and hpux) and knowing how the equivalent work was done in some other hosts (aix, same for all intents and purposes), I'm really at a loss to understand something here. Eric [1] rfc761 (tcp) and rfc768 (udp) are 8-bit transparent transport protocols. At 6/28/01 11:11 AM, Patrik Fältström wrote: >--On 01-06-28 13.18 -0400 Eric Brunner-Williams in Portland Maine ><brunner@nic-naa.net> wrote: > > > I don't get the point of your final sentence. > >Quite a number of charsets (including UTF-8) use more than 8 bit per >character. Because of that, the protocol and the applications which use a >textual based protocol need to operate on characters, and not 8-bit blocks, >and because of this understand the (variable?) size of a character. > >This is one of the reasons why the encoding MIME uses works well for >"comments" but very badly for other data in headers in email messages. > > paf
Received on Thursday, 28 June 2001 15:51:19 UTC