Re: PROPOSAL: i74: Encoding for non-ASCII headers

On Apr 3, 2008, at 5:04 PM, Mark Nottingham wrote:
> Roy, you often talk about things that will happen as part of the  
> partitioning, but it's not clear what's involved. Do you have a  
> roadmap of what you'd like to happen? If we're going to attempt  
> substantial rewrites of different sections, we need to start that  
> process soon.

My roadmap is to make each part independent and complete.  It should  
only
affect the requirements that are vaguely specified as applying to HTTP
as a whole (most notably in part 1 where the message is parsed, where
connections are managed, and the spaghetti cross-references and  
redundant
requirements in caching).  Absolutely no protocol changes will occur as
a result -- only removal of overspecification.

> Without some idea of that, it's going to be difficult to make any  
> forward progress if we have to block any substantive issues on  
> future rewrites that we may or may not do.

This is only the second issue (out of 90+) that I have asked to be
deferred.  Are we making progress right now by driving for consensus
on an absolutely pointless and UNIMPLEMENTED issue?  Even if you
derive an answer, it is inherently bogus and can't be published
because we still must have two implementations of every requirement.
That's why I keep asking for demonstrated need -- if we can't  
demonstrate
that a new requirement is needed, then there is no reason to discuss
the text of a solution.  It is false progress and it prevents me from
allocating my own time more wisely.

>> The parsing algorithm will not say anything about C1 controls because
>> no known implementation of HTTP checks for C1 controls.
>
> That doesn't follow. If there are security or interoperability  
> implications
> of C1 controls in text, it certainly deserves consideration.

There aren't any for HTTP.

> Also, while the message parsing machinery doesn't touch TEXT, there  
> are other parts of implementations that do -- e.g., command-line  
> tools, Web forms for configuring servers and proxies, configuration  
> files, and so forth. Just because it's payload doesn't mean that it  
> doesn't have implementation impact.

It does have implementation impact in the four fields that I described,
each of which are defined in a different location.  Two of those  
locations
(body and defined header fields) are specified already.  Reason  
Phrase is
solidly iso-8859-1 and already ignored.  That leaves extension-field,
which can say whatever we like without impacting standard status because
it is specifically defined for extensions.  This will be obvious after
the message parsing algorithm in Part 1 is properly specified and
verified against current practice, which is why I asked that the issue
be deferred.  I'll have time to work on it after April 13, when I get
back from ApacheCon in Amsterdam where I am giving two presentations
next week.

Meanwhile, if you really want to make progress on *this* issue, then the
way forward is to start using the wiki to collect experience reports
on what is and is not implemented in practice.  Further discussion is  
*not*
making progress.  In particular, ask the working group to help  
demonstrate
implementation practice instead of just offering opinions.  Encourage
folks to try using SetHeader in Apache, its equivalent in IIS, Squid,
and others.  Try doing the same with javascript or extensions in  
browsers.
We might find that raw UTF-8 is already possible for extension fields.
Or it may be that OCTETs are the only things that matter and no existing
character encoding truly applies.  In any case, it is easier to find
compliant implementations if we reduce the number of existing  
requirements
rather than add to them.

....Roy

Received on Friday, 4 April 2008 01:21:50 UTC