Re: Very large values (Re: Call For Adoption Live Byte Ranges) from Craig Pratt on 2017-01-03 (ietf-http-wg@w3.org from January to March 2017)

From: Craig Pratt <craig@ecaspia.com>
Date: Tue, 3 Jan 2017 01:36:43 -0800
To: Martin Thomson <martin.thomson@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <f2cba1d4-246b-ce06-cbcf-67a42b5835c1@ecaspia.com>
Thanks Martin for the feedback. I'm glad to get this feedback prior to
revising - since you bring up some good points. Reply in-line.

On 1/2/17 4:37 PM, Martin Thomson wrote:
> On 3 January 2017 at 10:00, Craig Pratt <craig@ecaspia.com> wrote:
>> 2^63 is 9223372036854775808 (decimal). I've defined a smaller value to avoid
>> potential conflicts and to make the value more easily identifiable:
>> 9222999999999999999.
>>
>> I think having a clearly-defined Very Large Value such as this to represent
>> the indeterminate end of content will be more deterministic/easily
>> implemented than having a Server try to establish a VLV in each HTTP
>> exchange. But I'd appreciate any thoughts prior to revising the draft.
> I think that any value you choose will be OK-ish.  The question is
> whether you think that there is a response that will exceed that size.
> If there is, then no single value you choose will be enough.  If that
> is possible, then you don't want a single fixed value at all, just a
> recommendation to pick a big number that far exceeds the size you
> want/expect.
On that, I think we're OK - based on the use cases I'm concerned about,
2^63 would be more than enough.
> I guess the other concern is that 9222999999999999999 (which I had to
> copy because I go cross-eyed counting those nines), is too big for
> some numeric formats.  Javascript has trouble with that number, which
> it reads as 9223000000000000000 instead, a problem that starts with
> 9007199254740993 (just paste that into your browser console and see
> what comes back). That suggests a smaller value might be safer, but
> then you have more problems with overflow.
Yeah - my bad for not having researched that. I've done much in Java,
but not JavaScript (yet).

It looks like ECMAScript 6 uses an IEEE 754 number format. And
JavaScript defines Number.MAX_SAFE_INTEGER as 2^53 - 1
(9007199254740991). So yeah - any HTTP request using a JS number
as a range value isn't going to be able to (accurately) represent numbers
beyond that, as you've observed. This is definitely an issue, IMHO.
> Note that whatever value you pick has to be safe for a great many
> implementations, even if those implementations never need that space.
> They still have to parse the value properly, preferably without
> resorting to use of bignums.
>
> If you believe it to be possible to pick a safe value that will never
> be exceeded, then ignore the rest of my mail :)
I think what concerns me now is (a) any other language-specific and machine-
specific gotchas, and (b) the fact that this value could represent a 
real limit for
some high-rate application someone drams up. e.g. A limit of 2^53 causes my
1Gb/s example to go from a quite-comfortable 2339 years of content to a 
less-comfortable 2 years...
> The risk in specifying a single value is that implementations will
> hard-code checks around that value like (end == VLV) or if things are
> done poorly (end >= VLV).  Implementations that have that check will
> assume indefinite ranges, even if there isn't an indefinite range and
> might get caught with bugs, like infinite loops:
>
> 10: I have up to <VLV>, I need more bytes
> 20: ask for a range from current end to <VLV> (i.e., VLV-VLV)
> 30: get a zero-length range back
> 40: if need more bytes, goto 20
>
> That leads to problems: implementations won't be able to send
> responses of exactly the size you choose (however unlikely that is),
> or in the bad case, you won't ever be able to exceed that value.
>
> You can get the same effect if major implementations pick the same value.
Agreed. No one should use floating point representations for this stuff,
IMHO. But as it can't be avoided (JS isn't going away), then all the 
typical
rules about equality checks and floating point become necessary. And
therein lies many potential bugs.
> On the other hand, a client can just pick an arbitrary stupidly large
> value (ASLV).  This can be an increment on what the client already
> has, and should probably include some randomness.  If there is that
> much still remaining, then they just have to make a new request.
I didn't think of this as a common use case, but who am I to say?

BTW, I'd like to consider the term "AASLUVE" (Absurdly Arbitrary
Stupidly Large Unrepresentable Value Encoding) for the Definitions
section.
> Thus, clients can pick a minimum increment that won't cause too much
> pain for them.  2^32 might be enough for clients that don't mind
> making a request every 4Gb or so, and it might make sense to start
> with "smaller" increments like that to avoid triggering
> incompatibility problems.
Yeah - I'd thought about 32-bit values in the initial draft - esp for
limited-resource devices that wish to avoid bigint math/comparisons.
> Adding some amount of randomness will provide greater surety that the
> server has read and understood the request.  e.g.,
>
> aslv = lastByte + 2**32 + random(2**32)
> request.setHeader('Content-Range, 'bytes %d-%d/*' % (lastByte, aslv)
I hope using randomness isn't necessary. But I see what you're getting at.

I think the points you bring up convince me that we should stick with
the mechanism defined in the current Live Bytes draft. And while the
flexibility afforded the Client requires a bit more description and
server-side logic than defining a single VLMV (Very Large Magic Value),
I think it's still sufficiently simple and covers the bases. (and it's 
already
written)

I do think I should expand upon the Security Issues section a bit to
describe the additional issues with Very Large Values that you've
highlighted. Servers really need to be able to handle VLVs. And returning
the same range end value provided by the Client (which may be a VLV)
is critical for the operation of the Live Bytes mechanism as defined. So
look for a revision that incorporates this (and some of Poul-Henning's
corrections)

Thanks again,

cp

-- 

craig pratt

Caspia Consulting

craig@ecaspia.com

503.746.8008
Received on Tuesday, 3 January 2017 09:37:18 UTC