- From: Janusz Majnert <j.majnert@samsung.com>
- Date: Tue, 20 Aug 2013 08:39:57 +0200
- To: public-script-coord@w3.org
On 2013-08-19 18:37, Jonas Sicking wrote: > On Mon, Aug 19, 2013 at 1:00 AM, Janusz Majnert <j.majnert@samsung.com> wrote: >> >> On 2013-08-16 20:50, Jonas Sicking wrote: >>> >>> On Fri, Aug 16, 2013 at 3:13 AM, Janusz Majnert <j.majnert@samsung.com> >>> wrote: >> >> [cut] >> >>>> b) If I open a text file using some multi-byte encoding and call >>>> readText(2), will that increment the offset attribute by 2 or by the >>>> actual >>>> amount of bytes read? Note that incrementing by amount of bytes might not >>>> be >>>> possible before doing IO. >>>> >>>> c) If I open a text file using some multi-byte encoding then mix calls to >>>> read() and readText()? Or if I first set offset to some arbitrary value, >>>> that just happens to be not aligned with the code-point boundary and call >>>> readText()? >>> >>> >>> It's unclear if readText will make it into the first version. We >>> should probably get agreement on binary data handling before adding >>> text data to the mix. >>> >>> That said, my thinking was that readText operates on byte ranges. I.e. >>> the size passed to readText is not how many characters to read, but >>> rather how many bytes to read. That means that .readText(2) always >>> increases .offset by 2, but you won't always get back a string which >>> is 2 characters long. >>> >>> This matches how Blob and FileReader does text handling. >>> >> IMHO I would expect that by calling readText(5) I will read 5 characters... >> >> Have you considered specifying the "Text" mode with openRead/openWrite ? >> For example, you could have: >> Promise<FileHandle> openRead((DOMString or File) path, >> optional DOMString textEncoding); >> >> Promise<FileHandleWritable> openWrite((DOMString or File) path, >> OpenWriteOptions options, >> optional DOMString textEncoding); >> >> Where textEncoding means: >> - undefined - don't open in "text" mode >> - valid and supported encoding name - open in text mode and use this >> encoding >> - null or unsupported encoding name - open in text mode and autodetect >> encoding >> >> There would be no need to have readText(), and offset would be expressed not >> in bytes but in actual characters/code-points, ie >> handle.read(2) would read 2 characters, nevermind the encoding used. > > We still wouldn't be able to let offsets represent an offset in > characters since the only way to know where in the file the 10th > character is located is to read the whole file from the start. This > performs very poorly once you try to read from the millionth character > in a file. Yes, we would be able to let offset be expressed in characters. Yes, you would have to read the file from the beginning to get to the 10th character. Would this really perform poorly? Are you saying that with TextEncoder/TextDecoder you don't have to read the file from the beginning, or that it somehow performs better? IMHO, if you want to have a text-mode read() function, you need to accept that the performance will be worse than with plain read(), which is not a bad thing considering what that function would actually do. [cut] > I think my recommendation is to keep text support out of the spec for > now and instead rely on TextEncoder/TextDecoder. We can always add > text handling later or even in v2. Fine with me Regards -- Janusz Majnert Samsung R&D Institute Poland Samsung Electronics
Received on Tuesday, 20 August 2013 06:40:39 UTC