- From: Joshua Bell <jsbell@chromium.org>
- Date: Fri, 16 Mar 2012 09:19:44 -0700
On Thu, Mar 15, 2012 at 5:20 PM, Glenn Maynard <glenn at zewt.org> wrote: > On Thu, Mar 15, 2012 at 6:51 PM, Jonas Sicking <jonas at sicking.cc> wrote: > >> What's the use-case for the "stringLength" function? You can't decode >> into an existing datastructure anyway, so you're ultimately forced to >> call "decode" at which point the "stringLength" function hasn't helped >> you. >> > > stringLength doesn't return the length of the decoded string. It returns > the byte offset of the first \0 (or the length of the whole buffer, if > none), for decoding null-terminated strings. For multibyte encodings (eg. > everything except UTF-16 and friends), it's just memchr(), so it's much > faster than actually decoding the string. > And just to be clear, the use case is decoding data formats where string fields are variable length null terminated. > Currently the use-case of simply wanting to convert a string to a >> binary buffer is a bit cumbersome. You first have to call the >> "encodedLength" function, then allocate a buffer of the right size, >> then call the "encode" function. > > > I suggested eg. > > result = encode("string", "utf-8", null).output; > > which would create an ArrayBuffer of the required size. Presumably the > null ArrayBufferView argument would be optional, so you could just say > encode("string", "utf-8"). > I think we want both encoding and destination to be optional. That leads us to an API like: out_dict = stringEncoding.encode("string", opt_dict); .. where both out_dict and opt_dict are WebIDL Dictionaries: opt_dict keys: view, encoding out_dict keys: charactersWritten, byteWritten, output ... where output === view if view is supplied, otherwise a new Uint8Array (or Uint8ClampedArray??) If this instead is attached to String, it would look like: out_dict = my_string.encode(opt_dict); If it were attached to ArrayBufferView, having a right-size buffer allocated for the caller gets uglier unless we include a static version. It doesn't seem possible to implement the 'encode' function without >> doing multiple scans over the string. The implementation seems >> required both to check that the data can be decoded using the >> specified encoding, as well as check that the data will fit in the >> passed in buffer. Only then can the implementation start decoding the >> data. This seems problematic. >> > > Only if it guarantees that it doesn't write anything to the output buffer > unless the entire result will fit. I don't think we need to do that; just > guarantee that it'll be truncated on a whole codepoint. > Agreed. Input/output dicts mean the API documentation a caller needs to read to understand the usage is more complex than a function signature which is why I resisted them, but it does seem like the best approach. Thanks for pushing, Glenn! In the create-a-buffer-on-the-fly case there will be some memory juggling going on, either by initially over allocating or reallocating/moving. > I also don't think it's a good idea to throw an exception for encoding >> errors. Better to convert characters to the unicode replacement >> character. I believe we made a similar change to the WebSockets >> specification recently. >> > > Was that change made? I filed > https://www.w3.org/Bugs/Public/show_bug.cgi?id=16157, but it still seems > to be undecided. > Settling on an options dict means adding a flag to control this behavior (throws: true ?) doesn't extend the API surface significantly.
Received on Friday, 16 March 2012 09:19:44 UTC