Re: Request for feedback: Filesystem API from Janusz Majnert on 2013-08-20 (public-script-coord@w3.org from July to September 2013)

From: Janusz Majnert <j.majnert@samsung.com>
Date: Tue, 20 Aug 2013 08:39:57 +0200
To: public-script-coord@w3.org
Message-id: <52130F3D.2060609@samsung.com>
On 2013-08-19 18:37, Jonas Sicking wrote:
> On Mon, Aug 19, 2013 at 1:00 AM, Janusz Majnert <j.majnert@samsung.com> wrote:
>>
>> On 2013-08-16 20:50, Jonas Sicking wrote:
>>>
>>> On Fri, Aug 16, 2013 at 3:13 AM, Janusz Majnert <j.majnert@samsung.com>
>>> wrote:
>>
>> [cut]
>>
>>>> b) If I open a text file using some multi-byte encoding and call
>>>> readText(2), will that increment the offset attribute by 2 or by the
>>>> actual
>>>> amount of bytes read? Note that incrementing by amount of bytes might not
>>>> be
>>>> possible before doing IO.
>>>>
>>>> c) If I open a text file using some multi-byte encoding then mix calls to
>>>> read() and readText()? Or if I first set offset to some arbitrary value,
>>>> that just happens to be not aligned with the code-point boundary and call
>>>> readText()?
>>>
>>>
>>> It's unclear if readText will make it into the first version. We
>>> should probably get agreement on binary data handling before adding
>>> text data to the mix.
>>>
>>> That said, my thinking was that readText operates on byte ranges. I.e.
>>> the size passed to readText is not how many characters to read, but
>>> rather how many bytes to read. That means that .readText(2) always
>>> increases .offset by 2, but you won't always get back a string which
>>> is 2 characters long.
>>>
>>> This matches how Blob and FileReader does text handling.
>>>
>> IMHO I would expect that by calling readText(5) I will read 5 characters...
>>
>> Have you considered specifying the "Text" mode with openRead/openWrite ?
>> For example, you could have:
>> Promise<FileHandle> openRead((DOMString or File) path,
>>          optional DOMString textEncoding);
>>
>> Promise<FileHandleWritable> openWrite((DOMString or File) path,
>>          OpenWriteOptions options,
>>          optional DOMString textEncoding);
>>
>> Where textEncoding means:
>> - undefined - don't open in "text" mode
>> - valid and supported encoding name - open in text mode and use this
>> encoding
>> - null or unsupported encoding name - open in text mode and autodetect
>> encoding
>>
>> There would be no need to have readText(), and offset would be expressed not
>> in bytes but in actual characters/code-points, ie
>> handle.read(2) would read 2 characters, nevermind the encoding used.
>
> We still wouldn't be able to let offsets represent an offset in
> characters since the only way to know where in the file the 10th
> character is located is to read the whole file from the start. This
> performs very poorly once you try to read from the millionth character
> in a file.
Yes, we would be able to let offset be expressed in characters. Yes, you 
would have to read the file from the beginning to get to the 10th 
character. Would this really perform poorly? Are you saying that with 
TextEncoder/TextDecoder you don't have to read the file from the 
beginning, or that it somehow performs better?
IMHO, if you want to have a text-mode read() function, you need to 
accept that the performance will be worse than with plain read(), which 
is not a bad thing considering what that function would actually do.


[cut]
> I think my recommendation is to keep text support out of the spec for
> now and instead rely on TextEncoder/TextDecoder. We can always add
> text handling later or even in v2.
Fine with me

Regards
-- 
Janusz Majnert
Samsung R&D Institute Poland
Samsung Electronics
Received on Tuesday, 20 August 2013 06:40:39 UTC