Re: Are international characters allowed in email addresses? from Eric Brunner-Williams on 2001-06-28 (ietf-discuss@w3.org from June 2001)

From: Eric Brunner-Williams <wampum@maine.rr.com>
Date: Thu, 28 Jun 2001 15:42:07 -0400
To: Patrik Fältström <paf@cisco.com>, Eric Brunner-Williams in Portland Maine <brunner@nic-naa.net>
Cc: Eric Brunner-Williams in Portland Maine <brunner@nic-naa.net>, Keith Moore <moore@cs.utk.edu>, discuss@apps.ietf.org, brunner@nic-naa.net
Message-Id: <5.1.0.14.0.20010628151517.02aa43b0@pop-server>

Patrik,

So the point of your comment is that you think I need to know this fact.

OK. If I didn't know it previously, say at any time since 1988, I know it
now. I know I knew it then, the 8859-*/2022 mess was so close to being
real I almost had to stuff it or its premature antecedent into X/Open's
first UNIX standard. A chararacter != a byte.

Now if we were having a discussion about process vs file encodings,
as OS implementors (which only one of us is, to my knowledge), then
we could segue tothe sub-7-bit processing model of some application(s),
and the multi-byte processing model of some (other) application(s), their
file encoding requirements, and our (the OS weenies owning the bottom-
half-of-the-tty display-width problem, the lib{c,mb,w} APIs, collation orders,
etc.) studied indifference to these application-specific ephemera.

We could marvel that some apps dorks were imposing a 6-bit-minus-one
code-space restriction on an 8-bit organized i/o and memory system [1],
or a must-be-stateless octet processing semantic as the sole means of
getting a more-than-8-bit code-space on an 8-bit memory type.

Having done the ASCII-to-UTF-8 transformation twice in two similar host
source bases (solaris and hpux) and knowing how the equivalent work
was done in some other hosts (aix, same for all intents and purposes), I'm
really at a loss to understand something here.

Eric
[1] rfc761 (tcp) and rfc768 (udp) are 8-bit transparent transport protocols.

At 6/28/01 11:11 AM, Patrik Fältström wrote:
>--On 01-06-28 13.18 -0400 Eric Brunner-Williams in Portland Maine
><brunner@nic-naa.net> wrote:
>
> > I don't get the point of your final sentence.
>
>Quite a number of charsets (including UTF-8) use more than 8 bit per
>character. Because of that, the protocol and the applications which use a
>textual based protocol need to operate on characters, and not 8-bit blocks,
>and because of this understand the (variable?) size of a character.
>
>This is one of the reasons why the encoding MIME uses works well for
>"comments" but very badly for other data in headers in email messages.
>
>    paf

Received on Thursday, 28 June 2001 15:51:19 UTC