RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)? from Pablo Castro on 2011-02-15 (public-webapps@w3.org from January to March 2011)

From: Pablo Castro <Pablo.Castro@microsoft.com>
Date: Tue, 15 Feb 2011 07:38:41 +0000
To: Jonas Sicking <jonas@sicking.cc>, Jeremy Orlow <jorlow@chromium.org>
CC: Shawn Wilsher <sdwilsh@mozilla.com>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <F108E2F6BA743C4696146F0B7111C26103667E@TK5EX14MBXC242.redmond.corp.microsoft.co>
(sorry for my random out-of-timing previous email on this thread. please see below for an actually up to date reply)

-----Original Message-----
From: Jonas Sicking [mailto:jonas@sicking.cc] 
Sent: Monday, February 07, 2011 3:31 PM

On Mon, Feb 7, 2011 at 3:07 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
> On Mon, Feb 7, 2011 at 2:49 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>
>> On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> > On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> >>
>> >> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow <jorlow@chromium.org>
>> >> wrote:
>> >> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher <sdwilsh@mozilla.com>
>> >> > wrote:
>> >> >>
>> >> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>> >> >>>
>> >> >>> My current thinking is that we should have some relatively large
>> >> >>> limit....maybe on the order of 64k?  It seems like it'd be very
>> >> >>> difficult
>> >> >>> to
>> >> >>> hit such a limit with any sort of legitimate use case, and the
>> >> >>> chances
>> >> >>> of
>> >> >>> some subtle data-dependent error would be much less.  But a 1GB key
>> >> >>> is
>> >> >>> just
>> >> >>> not going to work well in any implementation (if it doesn't simply
>> >> >>> oom
>> >> >>> the
>> >> >>> process!).  So despite what I said earlier, I guess I think we
>> >> >>> should
>> >> >>> have
>> >> >>> some limit...but keep it an order of magnitude or two larger than
>> >> >>> what
>> >> >>> we
>> >> >>> expect any legitimate usage to hit just to keep the system as
>> >> >>> flexible
>> >> >>> as
>> >> >>> possible.
>> >> >>>
>> >> >>> Does that sound reasonable to people?
>> >> >>
>> >> >> Are we thinking about making this a MUST requirement, or a SHOULD?
>> >> >>  I'm
>> >> >> hesitant to spec an exact size as a MUST given how technology has a
>> >> >> way
>> >> >> of
>> >> >> changing in unexpected ways that makes old constraints obsolete.
>> >> >>  But
>> >> >> then,
>> >> >> I may just be overly concerned about this too.
>> >> >
>> >> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
>> >> > develop
>> >> > against one of the implementations that don't place a limit and then
>> >> > their
>> >> > app would break on the others.
>> >> > The reason that I suggested 64K is that it seems outrageously big for
>> >> > the
>> >> > data types that we're looking at.  But it's too small to do much with
>> >> > base64
>> >> > encoding binary blobs into it or anything else like that that I could
>> >> > see
>> >> > becoming rather large.  So it seems like a limit that'd avoid major
>> >> > abuses
>> >> > (where someone is probably approaching the problem wrong) but would
>> >> > not
>> >> > come
>> >> > close to limiting any practical use I can imagine.
>> >> > With our architecture in Chrome, we will probably need to have some
>> >> > limit.
>> >> >  We haven't decided what that is yet, but since I remember others
>> >> > saying
>> >> > similar things when we talked about this at TPAC, it seems like it
>> >> > might
>> >> > be
>> >> > best to standardize it--even though it does feel a bit dirty.
>> >>
>> >> One problem with putting a limit is that it basically forces
>> >> implementations to use a specific encoding, or pay a hefty price. For
>> >> example if we choose a 64K limit, is that of UTF8 data or of UTF16
>> >> data? If it is of UTF8 data, and the implementation uses something
>> >> else to store the date, you risk having to convert the data just to
>> >> measure the size. Possibly this would be different if we measured size
>> >> using UTF16 as javascript more or less enforces that the source string
>> >> is UTF16 which means that you can measure utf16 size on the cheap,
>> >> even if the stored data uses a different format.
>> >
>> > That's a very good point.  What's your suggestion then?  Spec unlimited
>> > storage and have non-normative text saying that
>> > most implementations will
>> > likely have some limit?  Maybe we can at least spec a minimum limit in
>> > terms
>> > of a particular character encoding?  (Implementations could translate
>> > this
>> > into the worst case size for their own native encoding and then ensure
>> > their
>> > limit is higher.)
>>
>> I'm fine with relying on UTF16 encoding size and specifying a 64K
>> limit. Like Shawn points out, this API is fairly geared towards
>> JavaScript anyway (and I personally don't think that's a bad thing).
>> One thing that I just thought of is that even if implementations use
>> other encodings, you can in the vast majority of cases do a worst-case
>> estimate and easily see that the key that is used is below 64K.
>>
>> That said, does having a 64K limit really help anyone? In SQLite we
>> can easily store vastly more than that, enough that we don't have to
>> specify a limit. And my understanding is that in the Microsoft
>> implementation, the limits for what they can store without resorting
>> to various tricks, is much lower. So since that implementation will
>> have to implement special handling of long keys anyway, is there a
>> difference between saying a 64K limit vs. saying unlimited?
>
> As I explained earlier: "The reason that I suggested 64K is that it seems
> outrageously big for the data types that we're looking at.  But it's too
> small to do much with base64 encoding binary blobs into it or anything else
> like that that I could see becoming rather large.  So it seems like a limit
> that'd avoid major abuses (where someone is probably approaching the problem
> wrong) but would not come close to limiting any practical use I can
> imagine."
> Since Chrome sandboxes the rendering process, if a web page allocates tons
> of memory and OOMs the process, you just get a sad tab or two.  But since
> IndexedDB is partially in the browser process, I need to make sure a large
> key is not going to OOM that (and thus crash the whole browser....something
> a web page should never be able to do in Chrome).
> Does FF and/or IE have any plans for similar limits?  If so, I really think
> we should coordinate.

We don't have any plans for similar limits right now. Though of course
if it's added to the spec we'd follow that.

I don't really feel strongly on the issue as long as the limit is high
enough (64K seems high enough, 2K does not) that non-malicious sites
generally won't ever see the limit.

I'm fine with imposing a limit mostly for predictability reasons. In practice I'm not sure this will help implementations a lot (don't know much about SQLite, but other databases tend have smaller page sizes and require non-blob data in records to fit in a single page). Even the OOM issue Jeremy was discussing could be applied to whole-records or other properties in records instead of keys, no? 

In the end, we could just put a number to encourage keys to be relatively small. The assumption for UTF-16 seems to be safe, and in any case is the safer assumption (i.e. if some implementation used something like UTF-8 then it'll just have margin to spare).

-pablo
Received on Tuesday, 15 February 2011 07:39:15 UTC