Re: #273: HTTP-Version should be redefined as fixed length pair of DIGIT . DIGIT

On Sat, Jun 25, 2011 at 05:11:32PM +1200, Amos Jeffries wrote:
> On 25/06/11 10:14, Willy Tarreau wrote:
> >Hi Julian,
> >
> >On Fri, Jun 24, 2011 at 10:16:26PM +0200, Julian Reschke wrote:
> >>    The HTTP version number consists of two non-negative decimal digits
> >
> >Since "integer" was changed to "digits" here, maybe we can remove the
> >"non-negative" precision ?
> 
> The text above is also overriding the possibility of double-digit major 
> versions. Implying "the version" only contains 2 digits in total.
>
> How about:
>   consists of two sequences of decimal digits separated by one dot

Mark suggested that at the speed the protocol is progression, there are
little chances that we'll ever see a two-digit major. I tend to agree a
lot with him on this, considering we remained on the same major+minor for
the last 14 years, and that emitting anything not starting with HTTP/1.
will be rejected at many places as previous check showed it. Simply send
HTTP/2.0 and we'll see various behaviours such among which :
  - a few considering it as 1.1
  - some considering it as 1.0
  - some considering it as 0.9 (since it matches neither 1.0 nor 1.1)
  - some rejecting it as invalid

(...)
> >   - squid : does sscanf("%d.%d"). Not sure what it does with negatives. 
> >   This
> >     was a pretty old version however (2.5-stable12), that might not count.
> 
> squid-2 all do that same sscanf. negative numbers get an 4xx error page.
> 
> squid-3 use isdigit() for any length numeric as per the spec. rejects 
> with 505 on anything other than 1.0 or 1.1.

OK thanks for the double check.

> >Given the diversity of methods, I think it's really nice that we can
> >simplify the parsing.
> 
> I disagree that the parsing is complex here. Just plain wrong in several 
> of those cases. It is a huge amount simpler to check for valid version 
> than valid method.

The diversity of the behaviours seems to imply that implementers disagree
with you. For instance, Apache is doing this in order to get both speed
in the common case and compliance in other cases (taken from 2.2.4) :

    /* Avoid sscanf in the common case */
    if (len == 8
        && pro[0] == 'H' && pro[1] == 'T' && pro[2] == 'T' && pro[3] == 'P'
        && pro[4] == '/' && apr_isdigit(pro[5]) && pro[6] == '.'
        && apr_isdigit(pro[7])) {
        r->proto_num = HTTP_VERSION(pro[5] - '0', pro[7] - '0');
    }
    else if (3 == sscanf(r->protocol, "%4s/%u.%u", http, &major, &minor)
             && (strcasecmp("http", http) == 0)
             && (minor < HTTP_VERSION(1, 0)) ) /* don't allow HTTP/0.1000 */
        r->proto_num = HTTP_VERSION(major, minor);
    else
        r->proto_num = HTTP_VERSION(1, 0);

All this just to decide between HTTP/1.0 or 1.1, and still it does not
support minors greater than 999. But the check is overall right (except
for the strcasecmp on the HTTP protocol, which was recently stated as
case sensitive).

With that complexity just to get a 1.0 vs 1.1 right when it's already
available at the cost of a single character compare or memcmp(), there
is no doubt we'll continue to see the simplified version for a long
time, especially in environments where this parsing is considered as
expensive for no added value.

I perfectly understand why some would prefer doing something like :

    if (memcmp(version, "HTTP/1.", 7) == 0)
       ver = 11 - (version[7] == '0');
    else
       ver = 9;

and use ver = 9, 10 or 11 for HTTP/0.9, 1.0 or 1.1. It's a lot cheaper,
always works for known versions and requires a lot less complexity to
get it right.

In fact, every time a spec will propose a wide range of possible values
where only a very small subset is used (small enough to fit in a boolean),
you can be sure that there will be substantial trade-offs between full
compliance and simplicity. This is well illustrated above by Apache which
uses a single integer for both major and minor (which is perfectly fine
in my opinion). The only drawback is that it's limited to minors between
0 and 999. We can all agree that it's enough, but it could as well have
tried to seek for even wider compliance and use 2 integers to store the
version, making the code more complex everywhere for no real benefit.

Regards,
Willy

Received on Saturday, 25 June 2011 06:01:10 UTC