Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)? from Jonas Sicking on 2011-02-07 (public-webapps@w3.org from January to March 2011)

From: Jonas Sicking <jonas@sicking.cc>
Date: Mon, 7 Feb 2011 14:49:02 -0800
To: Jeremy Orlow <jorlow@chromium.org>
Cc: Shawn Wilsher <sdwilsh@mozilla.com>, Pablo Castro <Pablo.Castro@microsoft.com>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <AANLkTin5TbVThx_MVf8XRYOyO5sGgO3zBxWJOAA98GRQ@mail.gmail.com>

On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher <sdwilsh@mozilla.com>
>> > wrote:
>> >>
>> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>> >>>
>> >>> My current thinking is that we should have some relatively large
>> >>> limit....maybe on the order of 64k?  It seems like it'd be very
>> >>> difficult
>> >>> to
>> >>> hit such a limit with any sort of legitimate use case, and the chances
>> >>> of
>> >>> some subtle data-dependent error would be much less.  But a 1GB key is
>> >>> just
>> >>> not going to work well in any implementation (if it doesn't simply oom
>> >>> the
>> >>> process!).  So despite what I said earlier, I guess I think we should
>> >>> have
>> >>> some limit...but keep it an order of magnitude or two larger than what
>> >>> we
>> >>> expect any legitimate usage to hit just to keep the system as flexible
>> >>> as
>> >>> possible.
>> >>>
>> >>> Does that sound reasonable to people?
>> >>
>> >> Are we thinking about making this a MUST requirement, or a SHOULD?  I'm
>> >> hesitant to spec an exact size as a MUST given how technology has a way
>> >> of
>> >> changing in unexpected ways that makes old constraints obsolete.  But
>> >> then,
>> >> I may just be overly concerned about this too.
>> >
>> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
>> > develop
>> > against one of the implementations that don't place a limit and then
>> > their
>> > app would break on the others.
>> > The reason that I suggested 64K is that it seems outrageously big for
>> > the
>> > data types that we're looking at.  But it's too small to do much with
>> > base64
>> > encoding binary blobs into it or anything else like that that I could
>> > see
>> > becoming rather large.  So it seems like a limit that'd avoid major
>> > abuses
>> > (where someone is probably approaching the problem wrong) but would not
>> > come
>> > close to limiting any practical use I can imagine.
>> > With our architecture in Chrome, we will probably need to have some
>> > limit.
>> >  We haven't decided what that is yet, but since I remember others saying
>> > similar things when we talked about this at TPAC, it seems like it might
>> > be
>> > best to standardize it--even though it does feel a bit dirty.
>>
>> One problem with putting a limit is that it basically forces
>> implementations to use a specific encoding, or pay a hefty price. For
>> example if we choose a 64K limit, is that of UTF8 data or of UTF16
>> data? If it is of UTF8 data, and the implementation uses something
>> else to store the date, you risk having to convert the data just to
>> measure the size. Possibly this would be different if we measured size
>> using UTF16 as javascript more or less enforces that the source string
>> is UTF16 which means that you can measure utf16 size on the cheap,
>> even if the stored data uses a different format.
>
> That's a very good point.  What's your suggestion then?  Spec unlimited
> storage and have non-normative text saying that most implementations will
> likely have some limit?  Maybe we can at least spec a minimum limit in terms
> of a particular character encoding?  (Implementations could translate this
> into the worst case size for their own native encoding and then ensure their
> limit is higher.)

I'm fine with relying on UTF16 encoding size and specifying a 64K
limit. Like Shawn points out, this API is fairly geared towards
JavaScript anyway (and I personally don't think that's a bad thing).
One thing that I just thought of is that even if implementations use
other encodings, you can in the vast majority of cases do a worst-case
estimate and easily see that the key that is used is below 64K.

That said, does having a 64K limit really help anyone? In SQLite we
can easily store vastly more than that, enough that we don't have to
specify a limit. And my understanding is that in the Microsoft
implementation, the limits for what they can store without resorting
to various tricks, is much lower. So since that implementation will
have to implement special handling of long keys anyway, is there a
difference between saying a 64K limit vs. saying unlimited?

Pablo: Would love to get your input on the above.

/ Jonas

Received on Monday, 7 February 2011 22:50:37 UTC