parsing decimals, was: HTTPbis -10 drafts published from Julian Reschke on 2010-07-14 (ietf-http-wg@w3.org from July to September 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Wed, 14 Jul 2010 11:09:15 +0200
To: Willy Tarreau <w@1wt.eu>
CC: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <4C3D7EBB.9060307@gmx.de>
On 14.07.2010 10:16, Willy Tarreau wrote:
> ...
>> How did you come to that conclusion?
>
> Observations, tests and implementations. When testing some products,
> it's often very instructive to try to send them too large content-length
> values or chunked encodings. And implementations have to choose a type
> for their value anyway. It's common to find "int" or "unsigned int"
> there. Even in haproxy, I chose "unsigned int" (32 bit) for the chunk
> size, and positive 63-bit value for the content length. At least I've
> been very careful to detect all overflows everywhere, but this is also
> because I'm sensibilized to the security implications. Other implementers
> may simply consider that "that large is enough and if it fails above that
> it does not matter".

Well, if it fails in the right way it's ok (such as failing with a 500), 
simply being a known limitation. It's only problematic if the error 
condition isn't detected and the implementation just proceeds with bad data.

>>> download a DVD image from a site because the announced content
>>> length was wrong due to integer overflow. I don't remember what
>>> server it was though :-/
>>
>> That's a server bug. If the content length exceeds what the server can
>> express in Content-Length, it should switch to chunked encoding.
>
> Agreed, and that might also be why I've only seen that once ;-)

Oh, I've seen those, too (did lots of >2GB testing with WebDAV many 
years ago). Back then, the Java servlet API didn't properly support 
content lengths > 2GB. Hopefully this has been fixed by today.

>>> I think it is difficult to ask every implementation to support
>>> at least XXX bits, but we should at least add as a strict
>>> requirement that any agent along the chain must detect integer
>>> overflows when parsing or printing these values, and must abort
>>> processing if this happens.
>>
>> I think that requirement is already implicit; if you can't parse a
>> length, don't pretend you can.
>
> I agree with that, the issue is being aware that you can't parse.
> I've seen HTTP parser code in enterprises... Believe me, sometimes
> you're thankful to the application server below which already does
> some cleaning. Most people are not aware of integer overflows and
> will only think with "large enough" quantities. Above that is
> infinite. The fact that an apparently large number decodes as a
> small one is non-sense for some. Basically, you'd find things such
> as :
>
>     l = atol(get_header("content-length"));
>
> And once you tell the guy that there is no error check and he even
> supports negative values, he would simply switch to :
>
>     l = strtoul(get_header("content-length"),&err, 0);
>
> (which now supports octal decoding).
>
> Since there are already some precisions in the draft, such as "any
> value>= 0 is valid", I think it makes sense to add a few words at
> some places to enforce certain things. Something like saying that
> Content-length is a DECIMAL representation might ring a bell to the
> implementer.

We had that discussion before (in the context of Javascript, see 
<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/161>). The ABNF 
already says it.

It's very hard to optimize the spec for people who don't read it 
properly, nor their language/API documentation.

>>> must be taken to ensure that the decoding is forced to base 10,
>>> otherwise a number such as 0123 may be automatically interpreted
>>> as the octal value for the decimal 83.
>>
>> That's true, but again, that's a quality of implementation issue.
>
> I completely agree. But if we're purist, it's noted nowhere that
> the values are expressed in decimal form and must be decoded as
> such ;-) The only thing that is indicated is that a DIGIT represents
> a decimal 0-9.

I'd prefer not to be purist.

>> We've had similar discussions before: it's not clear that considerations
>> like these belong into the actual HTTP spec. On the other hand, a
>> separate document discussing issues like these could certainly be
>> written (IETF Informational, or even somewhere else?).
>
> I think we could have something generic somewhere. But there are a
> few points in HTTP which, if poorly implemented, may have important
> side effects. A single sentence such as "Implementations MUST detect
> integer overflows and integer parsing errors" for the Content-Length,
> Bytes-range and chunk size is not too much and can help getting more
> reliable and interoperable implementations.

I don't believe that at all, sorry. People who get this wrong today get 
it wrong because they are sloppy programmers, not because of what the 
spec says.

There are many many things the spec *could* mention. Expecting 
overflows. Expecting parse errors. Treating absent parameters when the 
ABNF disallows them. Treating multiple header instances when only one is 
allowed. This is open-ended.

It doesn't *need* to be in *this* spec. But that doesn't mean that it 
would be a bad idea to work on a document like that.

Best regards, Julian
Received on Wednesday, 14 July 2010 09:09:55 UTC