- From: Jonas Sicking <jonas@sicking.cc>
- Date: Mon, 19 Aug 2013 09:37:05 -0700
- To: Janusz Majnert <j.majnert@samsung.com>
- Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
On Mon, Aug 19, 2013 at 1:00 AM, Janusz Majnert <j.majnert@samsung.com> wrote: > > On 2013-08-16 20:50, Jonas Sicking wrote: >> >> On Fri, Aug 16, 2013 at 3:13 AM, Janusz Majnert <j.majnert@samsung.com> >> wrote: > > [cut] > >>> b) If I open a text file using some multi-byte encoding and call >>> readText(2), will that increment the offset attribute by 2 or by the >>> actual >>> amount of bytes read? Note that incrementing by amount of bytes might not >>> be >>> possible before doing IO. >>> >>> c) If I open a text file using some multi-byte encoding then mix calls to >>> read() and readText()? Or if I first set offset to some arbitrary value, >>> that just happens to be not aligned with the code-point boundary and call >>> readText()? >> >> >> It's unclear if readText will make it into the first version. We >> should probably get agreement on binary data handling before adding >> text data to the mix. >> >> That said, my thinking was that readText operates on byte ranges. I.e. >> the size passed to readText is not how many characters to read, but >> rather how many bytes to read. That means that .readText(2) always >> increases .offset by 2, but you won't always get back a string which >> is 2 characters long. >> >> This matches how Blob and FileReader does text handling. >> > IMHO I would expect that by calling readText(5) I will read 5 characters... > > Have you considered specifying the "Text" mode with openRead/openWrite ? > For example, you could have: > Promise<FileHandle> openRead((DOMString or File) path, > optional DOMString textEncoding); > > Promise<FileHandleWritable> openWrite((DOMString or File) path, > OpenWriteOptions options, > optional DOMString textEncoding); > > Where textEncoding means: > - undefined - don't open in "text" mode > - valid and supported encoding name - open in text mode and use this > encoding > - null or unsupported encoding name - open in text mode and autodetect > encoding > > There would be no need to have readText(), and offset would be expressed not > in bytes but in actual characters/code-points, ie > handle.read(2) would read 2 characters, nevermind the encoding used. We still wouldn't be able to let offsets represent an offset in characters since the only way to know where in the file the 10th character is located is to read the whole file from the start. This performs very poorly once you try to read from the millionth character in a file. And note that this is unaffected by the syntax used for passing offsets. I.e. even if we removed the .offset property and instead added a offset argument we still would have the above problem. So we'd end up with using bytes for offsets and characters for lengths which seems very confusing. I think my recommendation is to keep text support out of the spec for now and instead rely on TextEncoder/TextDecoder. We can always add text handling later or even in v2. / Jonas
Received on Monday, 19 August 2013 16:38:03 UTC