Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)? from Jeremy Orlow on 2011-02-07 (public-webapps@w3.org from January to March 2011)

From: Jeremy Orlow <jorlow@chromium.org>
Date: Mon, 7 Feb 2011 15:07:11 -0800
To: Jonas Sicking <jonas@sicking.cc>
Cc: Shawn Wilsher <sdwilsh@mozilla.com>, Pablo Castro <Pablo.Castro@microsoft.com>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <AANLkTi=9jNAF6ink=KdvaOQ95CxC+JqXMkyep9-qww7K@mail.gmail.com>
On Mon, Feb 7, 2011 at 2:49 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow <jorlow@chromium.org> wrote:
> > On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking <jonas@sicking.cc> wrote:
> >>
> >> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow <jorlow@chromium.org>
> wrote:
> >> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher <sdwilsh@mozilla.com>
> >> > wrote:
> >> >>
> >> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
> >> >>>
> >> >>> My current thinking is that we should have some relatively large
> >> >>> limit....maybe on the order of 64k?  It seems like it'd be very
> >> >>> difficult
> >> >>> to
> >> >>> hit such a limit with any sort of legitimate use case, and the
> chances
> >> >>> of
> >> >>> some subtle data-dependent error would be much less.  But a 1GB key
> is
> >> >>> just
> >> >>> not going to work well in any implementation (if it doesn't simply
> oom
> >> >>> the
> >> >>> process!).  So despite what I said earlier, I guess I think we
> should
> >> >>> have
> >> >>> some limit...but keep it an order of magnitude or two larger than
> what
> >> >>> we
> >> >>> expect any legitimate usage to hit just to keep the system as
> flexible
> >> >>> as
> >> >>> possible.
> >> >>>
> >> >>> Does that sound reasonable to people?
> >> >>
> >> >> Are we thinking about making this a MUST requirement, or a SHOULD?
>  I'm
> >> >> hesitant to spec an exact size as a MUST given how technology has a
> way
> >> >> of
> >> >> changing in unexpected ways that makes old constraints obsolete.  But
> >> >> then,
> >> >> I may just be overly concerned about this too.
> >> >
> >> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
> >> > develop
> >> > against one of the implementations that don't place a limit and then
> >> > their
> >> > app would break on the others.
> >> > The reason that I suggested 64K is that it seems outrageously big for
> >> > the
> >> > data types that we're looking at.  But it's too small to do much with
> >> > base64
> >> > encoding binary blobs into it or anything else like that that I could
> >> > see
> >> > becoming rather large.  So it seems like a limit that'd avoid major
> >> > abuses
> >> > (where someone is probably approaching the problem wrong) but would
> not
> >> > come
> >> > close to limiting any practical use I can imagine.
> >> > With our architecture in Chrome, we will probably need to have some
> >> > limit.
> >> >  We haven't decided what that is yet, but since I remember others
> saying
> >> > similar things when we talked about this at TPAC, it seems like it
> might
> >> > be
> >> > best to standardize it--even though it does feel a bit dirty.
> >>
> >> One problem with putting a limit is that it basically forces
> >> implementations to use a specific encoding, or pay a hefty price. For
> >> example if we choose a 64K limit, is that of UTF8 data or of UTF16
> >> data? If it is of UTF8 data, and the implementation uses something
> >> else to store the date, you risk having to convert the data just to
> >> measure the size. Possibly this would be different if we measured size
> >> using UTF16 as javascript more or less enforces that the source string
> >> is UTF16 which means that you can measure utf16 size on the cheap,
> >> even if the stored data uses a different format.
> >
> > That's a very good point.  What's your suggestion then?  Spec unlimited
> > storage and have non-normative text saying that most implementations will
> > likely have some limit?  Maybe we can at least spec a minimum limit in
> terms
> > of a particular character encoding?  (Implementations could translate
> this
> > into the worst case size for their own native encoding and then ensure
> their
> > limit is higher.)
>
> I'm fine with relying on UTF16 encoding size and specifying a 64K
> limit. Like Shawn points out, this API is fairly geared towards
> JavaScript anyway (and I personally don't think that's a bad thing).
> One thing that I just thought of is that even if implementations use
> other encodings, you can in the vast majority of cases do a worst-case
> estimate and easily see that the key that is used is below 64K.
>
> That said, does having a 64K limit really help anyone? In SQLite we
> can easily store vastly more than that, enough that we don't have to
> specify a limit. And my understanding is that in the Microsoft
> implementation, the limits for what they can store without resorting
> to various tricks, is much lower. So since that implementation will
> have to implement special handling of long keys anyway, is there a
> difference between saying a 64K limit vs. saying unlimited?
>

As I explained earlier: "The reason that I suggested 64K is that it seems
outrageously big for the data types that we're looking at.  But it's too
small to do much with base64 encoding binary blobs into it or anything else
like that that I could see becoming rather large.  So it seems like a limit
that'd avoid major abuses (where someone is probably approaching the problem
wrong) but would not come close to limiting any practical use I can imagine.
"

Since Chrome sandboxes the rendering process, if a web page allocates tons
of memory and OOMs the process, you just get a sad tab or two.  But since
IndexedDB is partially in the browser process, I need to make sure a large
key is not going to OOM that (and thus crash the whole browser....something
a web page should never be able to do in Chrome).

Does FF and/or IE have any plans for similar limits?  If so, I really think
we should coordinate.

J
Received on Monday, 7 February 2011 23:08:03 UTC