- From: Jonas Sicking <jonas@sicking.cc>
- Date: Thu, 7 Jul 2011 12:26:08 -0700
- To: arun@mozilla.com
- Cc: "Gregg Tavares (wrk)" <gman@google.com>, Web Applications Working Group WG <public-webapps@w3.org>
On Tue, Jun 21, 2011 at 10:17 AM, Arun Ranganathan <arun@mozilla.com> wrote: > Sorry if these have all been discussed before. I just read the File API for > the first time and 2 random questions popped in my head. > 1) If I'm using readAsText with a particular encoding and the data in the > file is not actually in that encoding such that code points in the file can > not be mapped to valid code points what happens? Is that implementation > specific or is it specified? I can imagine at least 3 different behaviors. > > This should be specified better and isn't. I'm inclined to then return the > file in the encoding it is in rather than force an encoding (in other words, > ignore the encoding parameter if it is determined that code points can't be > mapped to valid code points in the encoding... also note that we say to > "Replace bytes or sequences of bytes that are not valid according to > the charset with a single U+FFFD character [Unicode]"). Right now, the spec > isn't specific to this scenario ("... if the user agent cannot decode blob > using encoding, then let charset be null" before the algorithmic steps, > which essentially forces UTF-8). I definitely don't think we should use some type of autodetecting of charset if people explicitly define one. That is likely to create more confusion and bugs than it'll solve problems. I don't fully understand what's undefined if we say that any invalid character should be replaced by U+FFFD? I.e. why isn't that enough? I'm not at all doubting that it isn't enough, but I'd like to understand how it's not enough in order to fix it. > 2) If I'm reading using readAsText a multibyte encoding (utf-8, shift-jis, > etc..) is it implementation dependent whether or not it can return partial > characters when returning partial results during reading? In other words, > Let's say the next character in a file is a 3 byte code point but the > reader has only read 2 of those 3 bytes so far. Is implementation dependent > whether result includes those 2 bytes before reading the 3rd byte or not? > > Yes, partial results are currently implementation dependent; the spec. only > says they SHOULD be returned. There was reluctance to have MUST condition > on partial file reads. I'm open to revisiting this decision if the > justification is a really good one. I absolutely don't think we should return partial results. From the page authors point of view .result should "stream" in. Once a character has been appended to it, it should never change. / Jonas
Received on Thursday, 7 July 2011 19:27:13 UTC