Re: parsing decimals, was: HTTPbis -10 drafts published from Willy Tarreau on 2010-07-14 (ietf-http-wg@w3.org from July to September 2010)

From: Willy Tarreau <w@1wt.eu>
Date: Wed, 14 Jul 2010 11:41:21 +0200
To: Julian Reschke <julian.reschke@gmx.de>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20100714094121.GF2717@1wt.eu>
On Wed, Jul 14, 2010 at 11:09:15AM +0200, Julian Reschke wrote:
> On 14.07.2010 10:16, Willy Tarreau wrote:
> >...
> >>How did you come to that conclusion?
> >
> >Observations, tests and implementations. When testing some products,
> >it's often very instructive to try to send them too large content-length
> >values or chunked encodings. And implementations have to choose a type
> >for their value anyway. It's common to find "int" or "unsigned int"
> >there. Even in haproxy, I chose "unsigned int" (32 bit) for the chunk
> >size, and positive 63-bit value for the content length. At least I've
> >been very careful to detect all overflows everywhere, but this is also
> >because I'm sensibilized to the security implications. Other implementers
> >may simply consider that "that large is enough and if it fails above that
> >it does not matter".
> 
> Well, if it fails in the right way it's ok (such as failing with a 500), 
> simply being a known limitation. It's only problematic if the error 
> condition isn't detected and the implementation just proceeds with bad data.

Exactly !

> >Since there are already some precisions in the draft, such as "any
> >value>= 0 is valid", I think it makes sense to add a few words at
> >some places to enforce certain things. Something like saying that
> >Content-length is a DECIMAL representation might ring a bell to the
> >implementer.
> 
> We had that discussion before (in the context of Javascript, see 
> <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/161>). The ABNF 
> already says it.

I've checked before reporting the issue but did not notice it :-(
I've checked again, and all I see is "1*DIGIT". I've checked RFC5234
too just in case, but it does not state either how to parse digits.

> It's very hard to optimize the spec for people who don't read it 
> properly, nor their language/API documentation.

To be honnest, here in my opinion we're in the "well known" area.
I think I'm trying to read the specs enough to find what I must do
and what I might receive based on what others might send, but if I
had a doubt about the base of a content-length for instance, after
a check in the spec I wouldn't find the response and I would have
to check what existing implementations do.

But seeing your patch above, and given the number of places where
"decimal" was explicitly stated, I think that a single sentence
along with the ABNF could state "All 1*DIGIT character sequences
are to be interepreted as decimal values".

> >>We've had similar discussions before: it's not clear that considerations
> >>like these belong into the actual HTTP spec. On the other hand, a
> >>separate document discussing issues like these could certainly be
> >>written (IETF Informational, or even somewhere else?).
> >
> >I think we could have something generic somewhere. But there are a
> >few points in HTTP which, if poorly implemented, may have important
> >side effects. A single sentence such as "Implementations MUST detect
> >integer overflows and integer parsing errors" for the Content-Length,
> >Bytes-range and chunk size is not too much and can help getting more
> >reliable and interoperable implementations.
> 
> I don't believe that at all, sorry. People who get this wrong today get 
> it wrong because they are sloppy programmers, not because of what the 
> spec says.

Well, one of the difficulties with HTTP is that no limit to anything
is specified. That's what makes it that open, but also what causes
so many arbitrary choices. I regularly hear questions such as "what's
the max length a URL can take ?" or "what's the max length of a header ?".
When I reply there's no such limit, people are embarrassed and have to
resort to the "large enough" principle, which generally means using a
type which can hold values that cannot be reached. I agree the spec
cannot correct such behaviours, but when some things are well-known
and some errors not uncommon, it does not cost much to help implementers
not do the same mistakes again.

> There are many many things the spec *could* mention. Expecting 
> overflows. Expecting parse errors. Treating absent parameters when the 
> ABNF disallows them. Treating multiple header instances when only one is 
> allowed. This is open-ended.
> 
> It doesn't *need* to be in *this* spec. But that doesn't mean that it 
> would be a bad idea to work on a document like that.

I see, maybe something like "best practices for reliable HTTP implementations" ?

Regards,
Willy
Received on Wednesday, 14 July 2010 09:41:51 UTC