Re: Call for Adoption: draft-reschke-rfc54987bis from Poul-Henning Kamp on 2015-10-01 (ietf-http-wg@w3.org from October to December 2015)

From: Poul-Henning Kamp <phk@phk.freebsd.dk>
Date: Thu, 01 Oct 2015 06:33:06 +0000
To: Mark Nottingham <mnot@mnot.net>
cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <8665.1443681186@critter.freebsd.dk>

--------
In message <D107F92F-F930-44AE-945A-9170389DFCC4@mnot.net>, Mark Nottingham wri
tes:

>We're belatedly adopting this; Julian asked for a breather while he
>finished other work, and now he's ready to commence.

I think adopting the draft is a good idea.

But I find some bits of the low level mechanics proposed troublesome.

For instance it worries me a lot to use '*' as magic marker in
fields which are historically thrown around fast and loose in all
sorts of programming environments where it may or may not be a
meta-character.

Can we find a less overloaded preferably non-meta character ?



If we can find two less overloaded characters, one can indicate
UTF-8, and the other that char set is explictly specified.

Judging from experience, these headers are going to vary a lot, so
if we can shave 5 characters of their length in the usual case,
that's a tangible benefit.

Something like:

    UTF-8 implied:

	foo: bar; title<='en'%C2%A3%20rates

    Charset explicitly specified:

	foo: bar; title>=iso-8859-1'en'%A3%20rates

(Where I'm not specifically proposing '<' or '>' but merely using them
for the example.)


But going even further:  I have a hard time coming up with a credible
(ie: non-demented) scenario for having multiple different charsets
in the same header.

Therefore I would prefer to put the charset at the front of the headers:

    UTF-8 implied:

	foo: = bar; title='en'%C2%A3%20rates

    Charset explicitly specified:

	foo: =iso-8859-1= bar; title='en'%A3%20rates

Some advantages:

* Very like to break in the majority of code which
  doesn't understand the new convention.  (ref: "Postel Was Wrong")

* Header compression algorithms can be smart about it.

* Charset can be converted transparently by proxies, servers,
  frameworks etc.


And we can go even further if we want to:

   If header contains a charset spec (as above) the rest of the
   header can use all byte values from the range [0x20-0xff] and
   %xx encoding/decoding SHALL NOT be performed.


-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

Received on Thursday, 1 October 2015 06:33:46 UTC