Re: Request for feedback: Filesystem API from Jonas Sicking on 2013-08-20 (public-script-coord@w3.org from July to September 2013)

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 20 Aug 2013 00:48:55 -0700
To: Janusz Majnert <j.majnert@samsung.com>
Cc: public-script-coord@w3.org
Message-ID: <CA+c2ei_6E5+_6-NEV7uxJ6hqm2vR5FwCR2VB98pVksBF3h_vvQ@mail.gmail.com>

On Aug 19, 2013 11:40 PM, "Janusz Majnert" <j.majnert@samsung.com> wrote:
> On 2013-08-19 18:37, Jonas Sicking wrote:
>> We still wouldn't be able to let offsets represent an offset in
>> characters since the only way to know where in the file the 10th
>> character is located is to read the whole file from the start. This
>> performs very poorly once you try to read from the millionth character
>> in a file.
>
> Yes, we would be able to let offset be expressed in characters. Yes, you
would have to read the file from the beginning to get to the 10th
character. Would this really perform poorly?

Yes, it would perform very poorly if you set the offset to 1000000 and do a
read-modify-rewind-write.

Implementations would be forced to keep complex caches to remember which
text offsets map to which byte offsets as to not have to read from the
beginning of the file over and over. And do complex logic for when to
invalidate/update those caches as the file is being modified.

All of this while the file is open. Caching between file opens would likely
not be doable at all.

> Are you saying that with TextEncoder/TextDecoder you don't have to read
the file from the beginning, or that it somehow performs better?

The difference is that operations that are expensive should look expensive.
Simply setting the offset to 1000000 and reading one character does not
make it obvious that 1MB of IO is happening.

If we force authors to do all of that IO and converting themselves it is
clear that it is an expensive operation.

And applications would then be encouraged to do their own offset caching if
needed.

> IMHO, if you want to have a text-mode read() function, you need to accept
that the performance will be worse than with plain read(), which is not a
bad thing considering what that function would actually do.

It is OK that text reading is a few percent slower. Or even twice as slow.
It is not OK if it is several orders of magnitude slower because it
requires the whole file to be read.

>> I think my recommendation is to keep text support out of the spec for
>> now and instead rely on TextEncoder/TextDecoder. We can always add
>> text handling later or even in v2.
>
> Fine with me

Cool.

/ Jonas

Received on Tuesday, 20 August 2013 07:49:23 UTC