Re: Restricting the HTTP method definition from James M Snell on 2013-08-21 (ietf-http-wg@w3.org from July to September 2013)

From: James M Snell <jasnell@gmail.com>
Date: Wed, 21 Aug 2013 08:18:37 -0700
To: Zhong Yu <zhong.j.yu@gmail.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CABP7RbcPC1hoWWpVp3DTk1yM==GiOOf6RKzs-KNCfTyZVdQ08w@mail.gmail.com>

The connection with what I described in that blog post is that many
implementers have, apparently, take a number of liberal shortcuts in
terms of how they parse and accept http methods. Node's parser, for
instance, will immediately error out if it receives a method that
doesn't start with an alpha character (which is reasonable). It also
does not support the use of any non-alpha character other than "-"
(dash). Their implementation also chooses not to support extension
methods because the implementers feel there is no way of supporting an
arbitrary unknown set of tokens in a performant way... and truthfully,
because there is no upper bound on the length of the methods and
because there is a significantly large value space, they do have a
point.

Now, I did say right up front that this is a fairly minor issue, and
if it doesn't happen, so be it. But the restrictions I suggest ought
to make it at least some degree easier for implementations to
generically handle extension methods in a performant manner.. while at
the same time more accurately reflecting the reality of what many
implementers are already doing.

On Wed, Aug 21, 2013 at 8:07 AM, Zhong Yu <zhong.j.yu@gmail.com> wrote:
> On Tue, Aug 20, 2013 at 6:22 PM, James M Snell <jasnell@gmail.com> wrote:
>> HTTPbis currently defines the request method as a "token" of unbounded-length.
>>
>> Specifically:
>>
>>    tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
>>     "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
>>    token = 1*tchar
>>    method = token
>>
>> This definition is overly broad and does not reflect real world use
>> [http://tools.ietf.org/html/draft-ietf-httpbis-method-registrations-12].
>>
>> I propose that in HTTP/2 we tighten this definition up significantly
>> and place an upper bound on the length a request method ought to be:
>>
>>   UPPER = %x41-5A
>>   method = UPPER *20( UPPER / "_" / "-" )
>>
>> This is obviously a strictly limited subset of what's allowed by the
>> current definition. It limits the length of method names to no more
>> than 20 characters, requires that methods be all uppercase, requires
>> that methods always start with a letter and limits non-letter
>> characters to the dash and underscore. The rule would be that all
>> *newly registered* HTTP methods MUST conform to the new rule but
>> implementations MAY choose to support the old definition if necessary
>> for backwards compatibility.
>>
>> It's a fairly minor issue, yes, but tightening this up ought to make
>> it easier for developers to create parsers that are both efficient
>> *and* compliant [http://www.chmod777self.com/2013/08/sigh.html]
>
> I don't see how the bug mentioned in the blog has anything to do with
> what you are proposing. It looks like node.js is accepting any "GE<*>"
> as "GET" where <*> can be any octet. Maybe node.js was assuming that
> the request has been validated by an upstream parser?
>
> Zhong Yu

Received on Wednesday, 21 August 2013 15:19:32 UTC