Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

On Mon, Feb 7, 2011 at 3:07 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Mon, Feb 7, 2011 at 2:49 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> > On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >>
>> >> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow <jorlow@chromium.org>
>> >> wrote:
>> >> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher <sdwilsh@mozilla.com>
>> >> > wrote:
>> >> >>
>> >> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>> >> >>>
>> >> >>> My current thinking is that we should have some relatively large
>> >> >>> limit....maybe on the order of 64k?  It seems like it'd be very
>> >> >>> difficult
>> >> >>> to
>> >> >>> hit such a limit with any sort of legitimate use case, and the
>> >> >>> chances
>> >> >>> of
>> >> >>> some subtle data-dependent error would be much less.  But a 1GB key
>> >> >>> is
>> >> >>> just
>> >> >>> not going to work well in any implementation (if it doesn't simply
>> >> >>> oom
>> >> >>> the
>> >> >>> process!).  So despite what I said earlier, I guess I think we
>> >> >>> should
>> >> >>> have
>> >> >>> some limit...but keep it an order of magnitude or two larger than
>> >> >>> what
>> >> >>> we
>> >> >>> expect any legitimate usage to hit just to keep the system as
>> >> >>> flexible
>> >> >>> as
>> >> >>> possible.
>> >> >>>
>> >> >>> Does that sound reasonable to people?
>> >> >>
>> >> >> Are we thinking about making this a MUST requirement, or a SHOULD?
>> >> >>  I'm
>> >> >> hesitant to spec an exact size as a MUST given how technology has a
>> >> >> way
>> >> >> of
>> >> >> changing in unexpected ways that makes old constraints obsolete.
>> >> >>  But
>> >> >> then,
>> >> >> I may just be overly concerned about this too.
>> >> >
>> >> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
>> >> > develop
>> >> > against one of the implementations that don't place a limit and then
>> >> > their
>> >> > app would break on the others.
>> >> > The reason that I suggested 64K is that it seems outrageously big for
>> >> > the
>> >> > data types that we're looking at.  But it's too small to do much with
>> >> > base64
>> >> > encoding binary blobs into it or anything else like that that I could
>> >> > see
>> >> > becoming rather large.  So it seems like a limit that'd avoid major
>> >> > abuses
>> >> > (where someone is probably approaching the problem wrong) but would
>> >> > not
>> >> > come
>> >> > close to limiting any practical use I can imagine.
>> >> > With our architecture in Chrome, we will probably need to have some
>> >> > limit.
>> >> >  We haven't decided what that is yet, but since I remember others
>> >> > saying
>> >> > similar things when we talked about this at TPAC, it seems like it
>> >> > might
>> >> > be
>> >> > best to standardize it--even though it does feel a bit dirty.
>> >>
>> >> One problem with putting a limit is that it basically forces
>> >> implementations to use a specific encoding, or pay a hefty price. For
>> >> example if we choose a 64K limit, is that of UTF8 data or of UTF16
>> >> data? If it is of UTF8 data, and the implementation uses something
>> >> else to store the date, you risk having to convert the data just to
>> >> measure the size. Possibly this would be different if we measured size
>> >> using UTF16 as javascript more or less enforces that the source string
>> >> is UTF16 which means that you can measure utf16 size on the cheap,
>> >> even if the stored data uses a different format.
>> >
>> > That's a very good point.  What's your suggestion then?  Spec unlimited
>> > storage and have non-normative text saying that
>> > most implementations will
>> > likely have some limit?  Maybe we can at least spec a minimum limit in
>> > terms
>> > of a particular character encoding?  (Implementations could translate
>> > this
>> > into the worst case size for their own native encoding and then ensure
>> > their
>> > limit is higher.)
>>
>> I'm fine with relying on UTF16 encoding size and specifying a 64K
>> limit. Like Shawn points out, this API is fairly geared towards
>> JavaScript anyway (and I personally don't think that's a bad thing).
>> One thing that I just thought of is that even if implementations use
>> other encodings, you can in the vast majority of cases do a worst-case
>> estimate and easily see that the key that is used is below 64K.
>>
>> That said, does having a 64K limit really help anyone? In SQLite we
>> can easily store vastly more than that, enough that we don't have to
>> specify a limit. And my understanding is that in the Microsoft
>> implementation, the limits for what they can store without resorting
>> to various tricks, is much lower. So since that implementation will
>> have to implement special handling of long keys anyway, is there a
>> difference between saying a 64K limit vs. saying unlimited?
>
> As I explained earlier: "The reason that I suggested 64K is that it seems
> outrageously big for the data types that we're looking at.  But it's too
> small to do much with base64 encoding binary blobs into it or anything else
> like that that I could see becoming rather large.  So it seems like a limit
> that'd avoid major abuses (where someone is probably approaching the problem
> wrong) but would not come close to limiting any practical use I can
> imagine."
> Since Chrome sandboxes the rendering process, if a web page allocates tons
> of memory and OOMs the process, you just get a sad tab or two.  But since
> IndexedDB is partially in the browser process, I need to make sure a large
> key is not going to OOM that (and thus crash the whole browser....something
> a web page should never be able to do in Chrome).
> Does FF and/or IE have any plans for similar limits?  If so, I really think
> we should coordinate.

We don't have any plans for similar limits right now. Though of course
if it's added to the spec we'd follow that.

I don't really feel strongly on the issue as long as the limit is high
enough (64K seems high enough, 2K does not) that non-malicious sites
generally won't ever see the limit.

/ Jonas

Received on Monday, 7 February 2011 23:31:34 UTC