W3C home > Mailing lists > Public > public-webapps@w3.org > January to March 2011

Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 15 Feb 2011 00:37:27 -0800
Message-ID: <AANLkTik5=zJLXknSHvTYJw7HM6P+5Gd+eLPKzwmjwsFb@mail.gmail.com>
To: Pablo Castro <Pablo.Castro@microsoft.com>
Cc: Jeremy Orlow <jorlow@chromium.org>, Shawn Wilsher <sdwilsh@mozilla.com>, "public-webapps@w3.org" <public-webapps@w3.org>
On Mon, Feb 14, 2011 at 11:38 PM, Pablo Castro
<Pablo.Castro@microsoft.com> wrote:
> (sorry for my random out-of-timing previous email on this thread. please see below for an actually up to date reply)
>
> -----Original Message-----
> From: Jonas Sicking [mailto:jonas@sicking.cc]
> Sent: Monday, February 07, 2011 3:31 PM
>
> On Mon, Feb 7, 2011 at 3:07 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
>> On Mon, Feb 7, 2011 at 2:49 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>>
>>> On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
>>> > On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>>> >>
>>> >> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow <jorlow@chromium.org>
>>> >> wrote:
>>> >> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher <sdwilsh@mozilla.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>>> >> >>>
>>> >> >>> My current thinking is that we should have some relatively large
>>> >> >>> limit....maybe on the order of 64k?  It seems like it'd be very
>>> >> >>> difficult
>>> >> >>> to
>>> >> >>> hit such a limit with any sort of legitimate use case, and the
>>> >> >>> chances
>>> >> >>> of
>>> >> >>> some subtle data-dependent error would be much less.  But a 1GB key
>>> >> >>> is
>>> >> >>> just
>>> >> >>> not going to work well in any implementation (if it doesn't simply
>>> >> >>> oom
>>> >> >>> the
>>> >> >>> process!).  So despite what I said earlier, I guess I think we
>>> >> >>> should
>>> >> >>> have
>>> >> >>> some limit...but keep it an order of magnitude or two larger than
>>> >> >>> what
>>> >> >>> we
>>> >> >>> expect any legitimate usage to hit just to keep the system as
>>> >> >>> flexible
>>> >> >>> as
>>> >> >>> possible.
>>> >> >>>
>>> >> >>> Does that sound reasonable to people?
>>> >> >>
>>> >> >> Are we thinking about making this a MUST requirement, or a SHOULD?
>>> >> >>  I'm
>>> >> >> hesitant to spec an exact size as a MUST given how technology has a
>>> >> >> way
>>> >> >> of
>>> >> >> changing in unexpected ways that makes old constraints obsolete.
>>> >> >>  But
>>> >> >> then,
>>> >> >> I may just be overly concerned about this too.
>>> >> >
>>> >> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
>>> >> > develop
>>> >> > against one of the implementations that don't place a limit and then
>>> >> > their
>>> >> > app would break on the others.
>>> >> > The reason that I suggested 64K is that it seems outrageously big for
>>> >> > the
>>> >> > data types that we're looking at.  But it's too small to do much with
>>> >> > base64
>>> >> > encoding binary blobs into it or anything else like that that I could
>>> >> > see
>>> >> > becoming rather large.  So it seems like a limit that'd avoid major
>>> >> > abuses
>>> >> > (where someone is probably approaching the problem wrong) but would
>>> >> > not
>>> >> > come
>>> >> > close to limiting any practical use I can imagine.
>>> >> > With our architecture in Chrome, we will probably need to have some
>>> >> > limit.
>>> >> >  We haven't decided what that is yet, but since I remember others
>>> >> > saying
>>> >> > similar things when we talked about this at TPAC, it seems like it
>>> >> > might
>>> >> > be
>>> >> > best to standardize it--even though it does feel a bit dirty.
>>> >>
>>> >> One problem with putting a limit is that it basically forces
>>> >> implementations to use a specific encoding, or pay a hefty price. For
>>> >> example if we choose a 64K limit, is that of UTF8 data or of UTF16
>>> >> data? If it is of UTF8 data, and the implementation uses something
>>> >> else to store the date, you risk having to convert the data just to
>>> >> measure the size. Possibly this would be different if we measured size
>>> >> using UTF16 as javascript more or less enforces that the source string
>>> >> is UTF16 which means that you can measure utf16 size on the cheap,
>>> >> even if the stored data uses a different format.
>>> >
>>> > That's a very good point.  What's your suggestion then?  Spec unlimited
>>> > storage and have non-normative text saying that
>>> > most implementations will
>>> > likely have some limit?  Maybe we can at least spec a minimum limit in
>>> > terms
>>> > of a particular character encoding?  (Implementations could translate
>>> > this
>>> > into the worst case size for their own native encoding and then ensure
>>> > their
>>> > limit is higher.)
>>>
>>> I'm fine with relying on UTF16 encoding size and specifying a 64K
>>> limit. Like Shawn points out, this API is fairly geared towards
>>> JavaScript anyway (and I personally don't think that's a bad thing).
>>> One thing that I just thought of is that even if implementations use
>>> other encodings, you can in the vast majority of cases do a worst-case
>>> estimate and easily see that the key that is used is below 64K.
>>>
>>> That said, does having a 64K limit really help anyone? In SQLite we
>>> can easily store vastly more than that, enough that we don't have to
>>> specify a limit. And my understanding is that in the Microsoft
>>> implementation, the limits for what they can store without resorting
>>> to various tricks, is much lower. So since that implementation will
>>> have to implement special handling of long keys anyway, is there a
>>> difference between saying a 64K limit vs. saying unlimited?
>>
>> As I explained earlier: "The reason that I suggested 64K is that it seems
>> outrageously big for the data types that we're looking at.  But it's too
>> small to do much with base64 encoding binary blobs into it or anything else
>> like that that I could see becoming rather large.  So it seems like a limit
>> that'd avoid major abuses (where someone is probably approaching the problem
>> wrong) but would not come close to limiting any practical use I can
>> imagine."
>> Since Chrome sandboxes the rendering process, if a web page allocates tons
>> of memory and OOMs the process, you just get a sad tab or two.  But since
>> IndexedDB is partially in the browser process, I need to make sure a large
>> key is not going to OOM that (and thus crash the whole browser....something
>> a web page should never be able to do in Chrome).
>> Does FF and/or IE have any plans for similar limits?  If so, I really think
>> we should coordinate.
>
> We don't have any plans for similar limits right now. Though of course
> if it's added to the spec we'd follow that.
>
> I don't really feel strongly on the issue as long as the limit is high
> enough (64K seems high enough, 2K does not) that non-malicious sites
> generally won't ever see the limit.
>
> I'm fine with imposing a limit mostly for predictability reasons. In practice I'm not sure this will help implementations a lot (don't know much about SQLite, but other databases tend have smaller page sizes and require non-blob data in records to fit in a single page). Even the OOM issue Jeremy was discussing could be applied to whole-records or other properties in records instead of keys, no?
>
> In the end, we could just put a number to encourage keys to be relatively small. The assumption for UTF-16 seems to be safe, and in any case is the safer assumption (i.e. if some implementation used something like UTF-8 then it'll just have margin to spare).

If everyone is fine with it, I'd say lets not put in any limits then.
There will always be limitations, but implementations are responsible
for making sure that they are sufficiently large that authors won't
run in to them.

/ Jonas
Received on Tuesday, 15 February 2011 08:38:31 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:43 GMT