[whatwg] API for encoding/decoding ArrayBuffers into text from Glenn Maynard on 2012-03-27 (public-whatwg-archive@w3.org from March 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Tue, 27 Mar 2012 18:45:57 -0500
Message-ID: <CABirCh_Htpx9hfmkMH+odX0dP1wZwUAX_OKg=tS1qFi=W=tJvw@mail.gmail.com>
On Tue, Mar 27, 2012 at 12:41 AM, Jonas Sicking <jonas at sicking.cc> wrote:

> The memchr is purely overhead, I.e. we are comparing memchr+decoding
> to decoding. So I don't see what's backing up the "probably the
> fastest thing" claim.
>

If you don't do it as an initial pass, then you have to embed null checks
into the inner loop of your decoding algorithm.  For example, an ASCII
decoder may look like:

// char *input = input buffer
// char *input_end = one past last byte of input buffer
// wchar_t *output = output buffer
input_end = memchr(input, 0, input_end - input);
while(input < input_end)
{
    if(*input >= 0x80)
        *output++ = 0xFFFD;
    else
        *output++ = *input;
    ++input;
}

If you don't do the initial search, then it becomes:

while(input < input_end && *input != 0)
{
    if(*input >= 0x80)
        *output++ = 0xFFFD;
    else
        *output++ = *input;
    ++input;
}

which means that you have an additional branch each time through the loop
to check for the null terminator.  That's likely to be slower than just
doing another pass.

But anyway, please either make a benchmark or two to show the differences
we're talking about, or drop "performance" as an argument.  This is all
just a distraction otherwise.  I don't think the speed of conversion is
even a serious issue, much less the microseconds taken by memchr.

I admit I missed the previous discussion which led to the agreement to
> keep the length measuring outside, so I don't know what arguments were
> presented. Any pointers would be appreciated.
>

You've already mentioned one of them: being able to tell how many bytes
were consumed.  Having a view.indexOf function is also obviously generally
useful, and it simplifies the API.

Beyond that, having a feature--whether a wrapper or a flag to the actual
decoder/encoder--that's just a shortcut for all of four or five liens of
code is just a minor convenience.  I don't think it's something so common
that we need to save people a few lines of trivial wrapper code that they
can write themselves.

 > It doesn't seem materially harder (a little more code, yes, but that's
> not
> > the same thing), and it's more general-purpose.
>
> I agree it doesn't seem materially harder. I also agree that I don't
> have data to show that it's materially slower. But it sounds like
> we're in agreement that keeping the logic outside is both harder and
> slower which honestly doesn't speak strongly in its favor.
>

Sorry, I'm confused--you're saying that it isn't harder, but we're in
agreement that it's harder.  Please clarify what you mean.

I don't believe it's meaningfully slower or harder.

I don't understand the argument that the alternative is more
> "general-purpose". The API is already generic in that you can use
> whatever delimiter you want since you pass in a view. The only
> functionality which is not available is finding a null-terminator in
> an arraybuffer which you are arguing below shouldn't be part of the
> decoder (which I agree with).
>

I'm confused.  What are you arguing?  "The alternative"--taking the null
terminator search out of the decoder--you seem to argue against (first
sentence), then to agree with (last sentence).  Can you back up and restate
what you're saying from scratch?

-- 
Glenn Maynard
Received on Tuesday, 27 March 2012 16:45:57 UTC