Re: File API: File's name property from Arun Ranganathan on 2013-09-06 (public-webapps@w3.org from July to September 2013)

From: Arun Ranganathan <arun@mozilla.com>
Date: Fri, 6 Sep 2013 14:04:31 -0400
To: Anne van Kesteren <annevk@annevk.nl>
Cc: Glenn Maynard <glenn@zewt.org>, WebApps WG <public-webapps@w3.org>
Message-Id: <03178691-233E-4A12-8827-3A7486CE4540@mozilla.com>

On Sep 6, 2013, at 11:42 AM, Anne van Kesteren wrote:

> On Wed, Sep 4, 2013 at 11:45 PM, Glenn Maynard <glenn@zewt.org> wrote:
>> On Tue, Sep 3, 2013 at 12:04 PM, Anne van Kesteren <annevk@annevk.nl> wrote:
>>> The problem is that once you put it through the URL parser it'll
>>> become "/". And I suspect given directory APIs and such it'll go
>>> through that layer at some point.
>> 
>> I don't follow.  Backslashes in filenames are escaped in URLs
>> (http://zewt.org/~glenn/test%5Cfile), like all the other things that require
>> escaping.
> 
> If the raw input to the URL parser includes a backslash, it'll be
> treated as a forward slash. I am not really expecting people to use
> encodeURI or such utilities.

I think it may be ok to restrict "/" and "\".  I don't think we lose too much here by not allowing historically "directory delimiting" characters in file names.

The question is what to do with a "/"  or a "\".   I'm inclined to say UAs should treat those as U+FFFD.

> 
>>> Well, my suggestion was rawName and name (which would have loss of
>>> information), per the current zip archive API design.
>> 
>> Having a separate field is fine.  This is specific to ZIPs, so it feels like
>> it belongs in a ZipFile subclass, not File itself.
> 
> Is it? There's no other file systems where the file names are
> effectively byte sequences? If that's the case, maybe that's fine.

Well…. 

Some file systems don't store names as unrestricted byte sequences (older Windows), but GNU systems usually do.  Some byte sequences are not valid names. Conversely, names of existing files may not be representable as byte sequences (and sometimes there are two representations -- e.g. Amèlie.txt will either use 00e9 or 0065 0031 for the è  -- both are Unicode equivalents, but are different byte sequences). Some file systems perform Unicode canonicalization on file names, which is more or less what I think the Web should do.

I think we run only a small risk of information loss, but I DO think that File name should be an [EnforceUTF16] DOMString.  That way, we have the best shot at byte sequences based on the underlying characterization.

Summary: I'll punt on File.rawName till a rainier day than today, but I will restrict "/" and "\" since they are historically directory separators.  I know that there are OTHER characters that we can also restrict, but these two are the big ones and get us some 80-20 sanitization :)

Glenn said:

>> It might be better to wait until we have a filesystem API, then piggyback on
>> that...

+1.

-- A*

Received on Friday, 6 September 2013 18:05:02 UTC