Re: PROPOSAL: i74: Encoding for non-ASCII headers from Jamie Lokier on 2008-03-28 (ietf-http-wg@w3.org from January to March 2008)

From: Jamie Lokier <jamie@shareable.org>
Date: Fri, 28 Mar 2008 14:05:18 +0000
To: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
Cc: ietf-http-wg@w3.org
Message-ID: <20080328140518.GB16629@shareable.org>

Frank Ellermann wrote:
> > An issue I have with RFC2047 is it seems to imply every "proper"
> > implementation of a HTTP reciever, which does something with received
> > TEXT (such as display it), needs to have a _large_ table of known
> > character set names and conversion routines.
> 
> No, by design MIME works if the other side has no clue what it is.
> 
> It would then see gibberish like =?us-ascii*en-GB?Q?hello_world?=
> and don't know that this an odd way to say "hello world".  It is
> not forced to know what "encoded words" are, and if it knows this
> it is not forced to support each and every charset.

You are sort of making my point for me.

If it's acceptable that a receiver sees gibberish text when it doesn't
understand the particular encoding, and it's acceptable that it
doesn't support each and every charset....

Is there a problem with transmitting binary UTF-8?  It's just an "odd
way to say" some i18n text.  Some receivers will decode it as
intended; some will show gibberish.  How is that different from your
example?

> If some communities use koi8-r or some older charset popular in
> JP this is the same issue as in ordinary Web browsers or e-mail:
> 
> I cannot read Cyrl or Jpan scripts, it is irrelevant from my POV
> how that is encoded.  

I thought the IETF was moving to recommend UTF-8 wherever possible
nowadays?

(Though I gather some people are still unhappy with Unicode, since it
doesn't distinguish some ideographs which are drawn differently in
different languages, and thus prefer not to use UTF-8.)

-- Jamie

Received on Friday, 28 March 2008 14:05:56 UTC