Re: Request for feedback: Filesystem API from Jonas Sicking on 2013-08-19 (public-script-coord@w3.org from July to September 2013)

From: Jonas Sicking <jonas@sicking.cc>
Date: Mon, 19 Aug 2013 09:37:05 -0700
To: Janusz Majnert <j.majnert@samsung.com>
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
Message-ID: <CA+c2ei-OSxk5yjndoXMEMbKzmEKVYLSYjqh=auiDj_douBg8zA@mail.gmail.com>

On Mon, Aug 19, 2013 at 1:00 AM, Janusz Majnert <j.majnert@samsung.com> wrote:
>
> On 2013-08-16 20:50, Jonas Sicking wrote:
>>
>> On Fri, Aug 16, 2013 at 3:13 AM, Janusz Majnert <j.majnert@samsung.com>
>> wrote:
>
> [cut]
>
>>> b) If I open a text file using some multi-byte encoding and call
>>> readText(2), will that increment the offset attribute by 2 or by the
>>> actual
>>> amount of bytes read? Note that incrementing by amount of bytes might not
>>> be
>>> possible before doing IO.
>>>
>>> c) If I open a text file using some multi-byte encoding then mix calls to
>>> read() and readText()? Or if I first set offset to some arbitrary value,
>>> that just happens to be not aligned with the code-point boundary and call
>>> readText()?
>>
>>
>> It's unclear if readText will make it into the first version. We
>> should probably get agreement on binary data handling before adding
>> text data to the mix.
>>
>> That said, my thinking was that readText operates on byte ranges. I.e.
>> the size passed to readText is not how many characters to read, but
>> rather how many bytes to read. That means that .readText(2) always
>> increases .offset by 2, but you won't always get back a string which
>> is 2 characters long.
>>
>> This matches how Blob and FileReader does text handling.
>>
> IMHO I would expect that by calling readText(5) I will read 5 characters...
>
> Have you considered specifying the "Text" mode with openRead/openWrite ?
> For example, you could have:
> Promise<FileHandle> openRead((DOMString or File) path,
>         optional DOMString textEncoding);
>
> Promise<FileHandleWritable> openWrite((DOMString or File) path,
>         OpenWriteOptions options,
>         optional DOMString textEncoding);
>
> Where textEncoding means:
> - undefined - don't open in "text" mode
> - valid and supported encoding name - open in text mode and use this
> encoding
> - null or unsupported encoding name - open in text mode and autodetect
> encoding
>
> There would be no need to have readText(), and offset would be expressed not
> in bytes but in actual characters/code-points, ie
> handle.read(2) would read 2 characters, nevermind the encoding used.

We still wouldn't be able to let offsets represent an offset in
characters since the only way to know where in the file the 10th
character is located is to read the whole file from the start. This
performs very poorly once you try to read from the millionth character
in a file.

And note that this is unaffected by the syntax used for passing
offsets. I.e. even if we removed the .offset property and instead
added a offset argument we still would have the above problem.

So we'd end up with using bytes for offsets and characters for lengths
which seems very confusing.

I think my recommendation is to keep text support out of the spec for
now and instead rely on TextEncoder/TextDecoder. We can always add
text handling later or even in v2.

/ Jonas

Received on Monday, 19 August 2013 16:38:03 UTC