W3C home > Mailing lists > Public > public-webapps@w3.org > April to June 2011

Re: [FileAPI] Updates to FileAPI Editor's Draft

From: Arun Ranganathan <arun@mozilla.com>
Date: Tue, 21 Jun 2011 13:17:26 -0400
Message-ID: <4E00D226.6040003@mozilla.com>
To: "Gregg Tavares (wrk)" <gman@google.com>
CC: Web Applications Working Group WG <public-webapps@w3.org>
> Sorry if these have all been discussed before. I just read the File 
> API for the first time and 2 random questions popped in my head.
>
> 1) If I'm using readAsText with a particular encoding and the data in 
> the file is not actually in that encoding such that code points in the 
> file can not be mapped to valid code points what happens? Is that 
> implementation specific or is it specified? I can imagine at least 3 
> different behaviors.

This should be specified better and isn't.  I'm inclined to then return 
the file in the encoding it is in rather than force an encoding (in 
other words, ignore the encoding parameter if it is determined that code 
points can't be mapped to valid code points in the encoding... also note 
that we say to "Replace bytes or sequences of bytes that are not valid 
according to thecharsetwith a single U+FFFD character [Unicode 
<http://dev.w3.org/2006/webapi/FileAPI/#Unicode>]").  Right now, the 
spec isn't specific to this scenario ("... if the user agent cannot 
decode blob using encoding, then let charset be null" before the 
algorithmic steps, which essentially forces UTF-8).

Can we list your three behaviors here, just so we get them on record? 
  Which behavior do you think is ideal?  More importantly, is 
substituting U+FFFD and "defaulting" to UTF-8 good enough for your 
scenario above?

>
> 2) If I'm reading using readAsText a multibyte encoding (utf-8, 
> shift-jis, etc..) is it implementation dependent whether or not it can 
> return partial characters when returning partial results during 
> reading? In other words,  Let's say the next character in a file is a 
> 3 byte code point but the reader has only read 2 of those 3 bytes so 
> far. Is implementation dependent whether result includes those 2 bytes 
> before reading the 3rd byte or not?
>

Yes, partial results are currently implementation dependent; the spec. 
only says they SHOULD be returned.  There was reluctance to have MUST 
condition on partial file reads.  I'm open to revisiting this decision 
if the justification is a really good one.

-- A*
Received on Tuesday, 21 June 2011 17:18:12 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:45 GMT