RE: Large content size value from Travis Snoozy (Volt) on 2007-01-05 (ietf-http-wg@w3.org from January to March 2007)

From: Travis Snoozy (Volt) <a-travis@microsoft.com>
Date: Fri, 5 Jan 2007 11:09:29 -0800
To: Henrik Nordstrom <hno@squid-cache.org>
CC: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <86EDC3963F04D546BED8996F77D290F6049D118268@NA-EXMSG-C138.redmond.corp.microsoft>
Short version:
        S4.4  : No mandate for clients to actually use determined length.
              Implies with a MUST that user agents can use non-length ways
              to find the end of a message.
        S14.13: No mandate for clients to use length specified in Content-
              Length.
        S8.2.2: Very specific about clients closing the connection in
              exceptional situations when sending entities to the server.
              Nothing similar for clients receiving entities is spec'd.

That's enough rope to hang oneself with.


Long version:

Henrik Nordstrom said:
> tor 2007-01-04 klockan 15:50 -0800 skrev Travis Snoozy (Volt):
>
> > Since it's possible for the client to detect when a Content-Length or
> > a chunk-length is too long,
>
> To be precise, what most programming langauges tells you (when used
> correctly) is that Content-Length could not be converted into the
> word-sise the application developer had selected for storing the said
> value internally in the application, not necessarily why it could not be
> converted.

Programmers always have the option of writing this routine themselves, or
(at the very least) writing a verification routine that will give an error
as to why a given value can't be parsed. In any case, it is *possible* (and,
IMHO, not too hard) for the client to detect when Content-Length et. al. are
too long. Whether or not the implementer actually does their own checking is
another question altogether (but that's the practical question, and to that
end, you're probably right).

> It is not realistic for the HTTP specification to expect that all
> implementations uses bignum for every integer which may be transmitted
> in the protocol.

No argument here.

> All that can be expected is that application developers recognize that
> failure to handle >2GB files

(4GiB, but whatever; each end has its own cap that can be arbitrary.)

> is a bug if their users expects it to work, and that all parties who agree
> on handling files >2GB do it in the same manner at the protocol level and
> this is fulfilled fine by the specs as it is.

Ideally. I'm still working on poking holes in that one.

> The range-retreival question is purely hypothetical. A client which can
> not handle large integer values for content length won't be able to
> split it up in ranges either as the range specifications need numbers
> larger than the client can represent.

[later]

> And how clients store downloaded content is completely outside the
> concerns of the specification.

I agree, though our definitions of "clients storing downloaded content"
might be slightly different. Storage proper is a secondary consideration,
and I agree that it's not something the spec has business with (aside from
no-store).

> A client is free to split downloaded files in many OS:level files if
> required, protocol specs do not care and must not care.

No argument here.

> >  SHOULD the client then attempt a series of byte-range requests
> > instead?
>
> Why on earth should a client do that under these conditions? It most
> likely won't be able to reassemble the result, or even compose the range
> requests..

Yep; that was half-baked of me, and I should've re-read that section before
asking. But to play devil's advocate for a moment, dealing with bignums in
one small, contained spot (composing the next range request, which would
presumably be <= the maximum int size the system wants to handle) is better
than having to deal with bignums at every step of the way.

> > Also, in regard to connection handling: as far as I can tell, the
> > client is going to have to close the connection if an oversized
> > Content-Length shows up, since the client won't be able to read
> > through to the next request reliably.
>
> Yes, unless it's seen acceptable to waste the network bandwidth sending
> the data to the bitbucket..

An implementer that thinks he's particularly clever might decide to try and
read as much as the message as he can, then spin through the rest of the
input until he hits a new response line (or, if no pipelining took place,
until the socket appears to be empty). Now, I'd consider this broken, a bad
idea, and otherwise unwise -- however, it might work "good enough," and it
isn't against the spec.

Section 4.4 does not mandate how clients receive a message with a Content-
Length, nor does section 14.13; nowhere does it say that the client _can't_
butcher the message it's receiving in an attempt to get to the next message.
One would hope that implementers don't do stupid things (like treat the
lengths determined from section 4.4 strategies as "advisory"), but Murphy's
Law dictates that _someone_ will. If anyone can find something that would
prevent me from using *ahem* "alternate methods" for finding the start of
the next message, please point it out.

Now, with that said, section 4.4 *does* say something about lengths that I
missed earlier:

   HTTP/1.1 user agents MUST notify the user when an invalid length is
   received and detected.

Now, first off, this is just user agents. Proxies are out in the cold
(they'd probably just send back some 5xx status saying that the upstream
server is being a pill). But this _does_ seem to imply that user agents (at
least; clients perhaps in general) are allowed to use whatever means they
deem necessary to find the end of a message *so long as they notify the
user*.


> >  If this is the case, is it specified?
>
> Does it need to? I think not.

It's a matter of consistency, and specified versus unspecified behavior.
Other parts of the spec are very explicit about when the client/server
should close the connection. Nowhere does it say that closing the connection
is the general way in which the client should deal with failure, just that
anyone MAY close their connections at any time. Case in point: section 8.2.2
does a fine job of indicating how the client should behave when _sending_ an
entity to the server, but there is no analogue for how the client should
behave when _receiving_ an entity.

Since the behavior is (strictly speaking) unspecified, I can assume that
closing the connection is the right course of action (like the spec probably
wants, but doesn't say), or I can assume that it's OK to grope out the end
of the message (and have the potential to make everyone downstream break).
I'm sure there are other things I could assume, too; that's why life tends
to be a whole lot easier when things are specified :).


Thanks,

-- Travis
Received on Friday, 5 January 2007 19:09:43 UTC