Re: File API: File's name property from Glenn Maynard on 2013-09-06 (public-webapps@w3.org from July to September 2013)

From: Glenn Maynard <glenn@zewt.org>
Date: Fri, 6 Sep 2013 18:39:43 -0500
To: Anne van Kesteren <annevk@annevk.nl>
Cc: Arun Ranganathan <arun@mozilla.com>, WebApps WG <public-webapps@w3.org>
Message-ID: <CABirCh_LL_1LqV8Z-902nR-Hf3oN5z68dyriUddxgKyf=X=OqA@mail.gmail.com>
On Fri, Sep 6, 2013 at 10:42 AM, Anne van Kesteren <annevk@annevk.nl>wrote:

> If the raw input to the URL parser includes a backslash, it'll be
> treated as a forward slash. I am not really expecting people to use
> encodeURI or such utilities.
>

People who don't will have a bug, but all this is doing is preemptively
adding the bug, not preventing it, and forcing it on unrelated features
(HTMLInputElement.files).  Don't the ZIP URL proposals require some
characters or other to be escaped anyway (at least of the ones that support
navigation)?

It's far too late to try to keep people from having to escape things in
URLs.

 > Having a separate field is fine.  This is specific to ZIPs, so it feels
> like
> > it belongs in a ZipFile subclass, not File itself.
>
> Is it? There's no other file systems where the file names are
> effectively byte sequences? If that's the case, maybe that's fine.
>

There are lots of them.  I meant that it seems like wanting to expose raw
bytes is specific to ZIPs.  I hope we wouldn't expose the user's local
filesystem locale to the Web.  Depending on the user's locale causes some
of the more obnoxious bugs the platform has, we should be fighting to kill
it, not add more of it.


>  > We definitely wouldn't
> > want raw bytes from filenames being filled in from user filesystems (eg.
> > Shift-JIS filenames in Linux),
>
> The question is whether you can have something random without
> associated encoding. If there's an encoding it's easy to put lipstick
> on a pig.
>

You can have filenames in Linux that are in a different encoding than
expected.  I don't know why you'd want to expose that to the web, though.


>  >> There's an API too.
> >
> > It might be better to wait until we have a filesystem API, then
> piggyback on
> > that...
>
> Yeah, I wondered about that. It depends on whether we want to expose
> directories or just treat a zip archive as an ordered map of
> path/resource pairs.
>

I've found being able to work with a directory or a ZIP in the same way to
be useful in the past, too.


On Fri, Sep 6, 2013 at 12:08 PM, Anne van Kesteren <annevk@annevk.nl> wrote:

> Actually, given that zip paths are byte sequences, that would not work
> anyway. The alternative might be to always map it to code points
> somehow via requiring an encoding to be specified and just deal with
> the losses, but that doesn't seem general purpose enough.
>

Taking an arbitrary use case: showing the user a list of files inside a
ZIP, and letting him pick one to be extracted.  Exposing raw filenames is
one way to make this work: you iterate over Files in the ZIP, pull out the
File.name for display to the user and stash the File.rawName so you can
look up the File later.  Once the user picks a file from the list, you call
zip.getFileByRawName(stashedRawName) with the associated rawName to
retrieve the selected file.

But, that doesn't "just work".  I assume the API will have a
"getFileByName(DOMString filename)"-like method as well as a rawName
method, and people will be much more likely to ignore byRawName and only
use byName.  The developer has to be careful to store the rawName and only
look up files using raw names if he wants broken filenames to work.

An alternative solution: as you iterate over Files to create a list to
display to the user, stash the File as well (instead of the rawName),
associated with each list entry.  When the user selects a file, you just
use the File you already have, and never pass the filename back to the
API.  This would also take special effort by developers, but no more than
the rawName solution, and it avoids exposing raw filenames entirely.

For ZIP URLs, it seems like linking inside a legacy ZIP (rather than a ZIP
of icons or whatever that you just created to link to) would be uncommon.
(Also, if you think people won't escape backslashes, they definitely won't
escape garbage filenames with a special byte-escape mechanism...)  Are
there likely use cases here?


On Fri, Sep 6, 2013 at 1:04 PM, Arun Ranganathan <arun@mozilla.com> wrote:

> I think it may be ok to restrict "/" and "\".  I don't think we lose too
> much here by not allowing historically "directory delimiting" characters in
> file names.
>

"\" is a valid character in real filenames.  This would break selecting
filenames with backslashes in them with HTMLInputElement, which works fine
today.

-- 
Glenn Maynard
Received on Friday, 6 September 2013 23:40:11 UTC