- From: Poul-Henning Kamp <phk@phk.freebsd.dk>
 - Date: Thu, 01 Oct 2015 06:33:06 +0000
 - To: Mark Nottingham <mnot@mnot.net>
 - cc: HTTP Working Group <ietf-http-wg@w3.org>
 
--------
In message <D107F92F-F930-44AE-945A-9170389DFCC4@mnot.net>, Mark Nottingham wri
tes:
>We're belatedly adopting this; Julian asked for a breather while he
>finished other work, and now he's ready to commence.
I think adopting the draft is a good idea.
But I find some bits of the low level mechanics proposed troublesome.
For instance it worries me a lot to use '*' as magic marker in
fields which are historically thrown around fast and loose in all
sorts of programming environments where it may or may not be a
meta-character.
Can we find a less overloaded preferably non-meta character ?
If we can find two less overloaded characters, one can indicate
UTF-8, and the other that char set is explictly specified.
Judging from experience, these headers are going to vary a lot, so
if we can shave 5 characters of their length in the usual case,
that's a tangible benefit.
Something like:
    UTF-8 implied:
	foo: bar; title<='en'%C2%A3%20rates
    Charset explicitly specified:
	foo: bar; title>=iso-8859-1'en'%A3%20rates
(Where I'm not specifically proposing '<' or '>' but merely using them
for the example.)
But going even further:  I have a hard time coming up with a credible
(ie: non-demented) scenario for having multiple different charsets
in the same header.
Therefore I would prefer to put the charset at the front of the headers:
    UTF-8 implied:
	foo: = bar; title='en'%C2%A3%20rates
    Charset explicitly specified:
	foo: =iso-8859-1= bar; title='en'%A3%20rates
Some advantages:
* Very like to break in the majority of code which
  doesn't understand the new convention.  (ref: "Postel Was Wrong")
* Header compression algorithms can be smart about it.
* Charset can be converted transparently by proxies, servers,
  frameworks etc.
And we can go even further if we want to:
   If header contains a charset spec (as above) the rest of the
   header can use all byte values from the range [0x20-0xff] and
   %xx encoding/decoding SHALL NOT be performed.
-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
Received on Thursday, 1 October 2015 06:33:46 UTC