Re: [whatwg] Drag-and-drop folders/files support with directory structure using DirectoryEntry from Kinuko Yasuda on 2012-09-24 (public-whatwg-archive@w3.org from September 2012)

From: Kinuko Yasuda <kinuko@chromium.org>
Date: Mon, 24 Sep 2012 16:12:20 +0900
To: Ian Hickson <ian@hixie.ch>
Cc: whatwg@whatwg.org, simonp@opera.com
Message-ID: <CAMWgRNZc43GxyUCHS4i5DCWZ0dYAxpZqZZBethS+n1coTYj-PA@mail.gmail.com>
Thanks for the feedback.

On Fri, Sep 14, 2012 at 6:58 AM, Ian Hickson <ian@hixie.ch> wrote:

> On Tue, 15 Nov 2011, Kinuko Yasuda wrote:
> >
> > Many sites have 'upload your files' feature, like for your photo images.
> > HTML5 allows you to do this via <input type="file" multiple> or
> > drag-and-drop feature, but the current solution does not provide clean
> > solution for cases with folders, files/folder mixed cases, or folders
> > with subfolders cases.
> >
> > For context, back then we have proposed (and implemented) 'directory'
> > attribute for <input type=file> specifically to upload a directory, but
> > the approach does not provide useful information to webapps about which
> > file comes from which folder, neither does it allow apps to control how
> > and when to enumerate directories (e.g. app cannot show progress meter
> > etc even the enumerating part takes long time).
> >
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-April/025764.html
>
> This isn't really about directories, it's a problem with file I/O in
> general, made worse when there are large numbers of files -- it's just
> that when you have directories you're more likely to have many files.
> Other situations also make this difficult, e.g. if the files are on a
> network drive with high latency, or a removable drive such as a DVD or
> tape drive.
>

This seems true.  For the record, when this proposal was made there wasn't
clear agreement that file metadata needs to be retrieved when the File
object is created, the file name was the only information that was clearly
necessary to create a File.

Fundamentally the problem is that the objects in drag-and-drop and in
> <input type=file> synchronously expose all the files, and we just don't
> necessarily have the time to get all the files' sizes before that starts
> to be noticably slow. We could have the UA show progress UI, but while
> that could work for <input type=file>, it would be quite jarring for drag
> and drop.
>
> There are various ways we could fix this if we were starting afresh, but
> if we're trying to keep backwards compatibility there's basically no
> solution: the spec already requires this sync API, and pages might depend
> on it.

So we have a problem: do we not fix the problem, do we break all pages
> always, break all pages but only when the user drags in a lot of files (so
> authors might not notice), break all pages whenever there's more than one
> file (so authors will notice but pages still support one file at a time),
> break pages only when the user drags in one or more directories?
>

My proposal is to add an alternative asynchronous API and encourage app
authors to use the version when they expect it could get large number of
files/directories.  This does not solve the existing problem but could
offer better alternative approach.

There's various ways we could fix the problem, if we're ok with breaking
> things. We could expose all the files in a flat list, incrementally. We
> could expose the directory hiearchy, with asynchronous access. If we do
> incremental access, there's various ways to do that: event-based
> notification that there's more data; an enumerator / callback mechanism; a
> lazy array where reading the number of files, or reading the nth file, is
> asynchronous... We can extend FileList and DataTransferItemList to support
> this, or we can add a new object that they point to, or we can just update
> FileList and make DataTransferItemList support the new object...
>
> In many cases, exposing the actual hierarchy can reduce the total amount
> of work that's needed, because many use cases don't actually need to crawl
> everything. For example, people gave examples of just wanting Subversion's
> internal .svn directories in a big tree, not the actual data; or indeed in
> other cases vice-versa.
>
> However, both exposing the hiearchy and flattening it have all kinds of
> risks. It's possible for the user to accidentally expose his entire
> computer's hard drive without realising it.


This seems to be possible regardless of whether we expose files in a
hierarchy or in a flattened list.

On some systems (including at
> least modern Mac OS and Linux OSes, not sure about Windows), it's possible
> to have hard-link loops.


Newer Mac OS X allows hard links on directories, but not in the way that
could create loops.
On most other OSes I believe hard links on directories are still disallowed.

On some systems, it's possible to drag special
> directories like "..", and it's not clear what that would mean. When the
> user drags files from multiple parts of the file system (e.g. from a
> Windows virtual folder), it's not clear what parts of the path we should
> expose -- even exposing just the common parts can expose sensitive
> information like the profile path if one file is in the user's profile and
> another is not.
>

The proposed spec doesn't say anything about that, but Chrome's basic
stance is we should not expose any information that is outside the dropped
files/folders even some of dragged paths have the common parts in their
ancestors.
Entries that can be obtained by the API only expose 'virtual paths' that
are the relative paths from the dragged root in the current implementation
(i.e. treating them as disjoint nodes).


> Also, none of these solutions helps with DataTransfer.types or exposing
> the types in DataTransfer.items while the drag is occurring, if the goal
> is to expose a deep crawl there. If we limit ourselves to just exposing
> the files that were dragged, then I think the OS will give us the list of
> files, so the problem is only statting them to get the sizes when you drop.
>
> On Tue, 15 Nov 2011, Glenn Maynard wrote:
> >
> > Entry (and subclasses) should also be supported by structured clone.
> > That would allow passing a DirectoryEntry received from file inputs to
> > be passed to a worker.  This is something for later, of course, but
> > combined with an API to convert between Entry and EntrySync (and
> > DE/DESync), this would allow using the much more convenient sync API in
> > a worker, even if the only way to retrieve the Entry in the first place
> > is in the UI thread.
>
> Any spec can define how they work with the structured clone algorithm.
> I'll let the Filesystem API editors consider this.
>
>
> On Thu, 5 Apr 2012, Kinuko Yasuda wrote:
> >
> > Based on the feedbacks we got on this list we've implemented the
> following
> > API to do experiments in Chrome:
> >    DataTransferItem.getAsEntry(in EntryCallback callback)
> > which takes a callback that returns FileEntry or DirectoryEntry if it's
> for
> > drop event and the item's kind is 'file'.
> > [later changed to be synchronous]
> >
> > We use kind=='file' in a broader definition here (i.e. a file path which
> > can be either regular file or directory file) and didn't add a specific
> > kind for directories.
> >
> > (Btw we've also implemented DataTransferItem.getAsFile(), so apps can
> call
> > either getAsFile or webkitGetAsEntry for kind=='file' item)
>
> This doesn't seem to solve the problems. It mitigates the problem of
> having to do a deep crawl, but it risks exposing file system loops and the
> other issues listed above.
>


> In any case, Opera and Mozilla have both indicated they are not interested
> in using the Filesystem API here, so I haven't added this to the spec.


It looks there's still a certain interest in having an async API to return
a list of dropped files.


> It's not clear to me how to move forward on this.
>
> My intuition is that we should assume that dragging in lots of files will
> not hurt due to the statted filed having been recently cached, and then
> expose the tree via objects, not via flattening. I don't see how to avoid
> exposing undetectable loops if we do this. Things like the meaning of ".."
> would be left to the UA, but ".." wouldn't ever be exposed as a folder
> name, certainly. Disjoint nodes would be treated as separate nodes in the
> drag, so there's no problem with exposing common paths with sensitive
> data, except if the user drags a sensitive path's parent (e.g. C:\). Not
> sure what to do with that, though.
>

This all sounds reasonable to me.


> Concretely, the least invasive way to do this is probably to piggy-back on
> the FileList and getAsFile solutions, and make a Directory object that
> parallels File and provides a list of files in the directory, with either
> getAsDirectory() being async or, more likely, the Directory object being
> enumerable in an async manner to get all the files.
>

I'd like to see how the ongoing FileSystem API discussion on the other
thread goes.

I think what we're really interested in (and hearing much desire) is to
have a reasonable API to enumerate dropped files/directories in an async
way, and it looked it'd be best to utilize the existing public spec
proposal (i.e. FileSystem API) to add the feature rather than starting over
from scratch.

For UAs that implement the FileSystem API, I would then recommend that the
> FlieSystem API provide ways to get from File and Directory objects to
> FileEntry and DirectoryEntry objects.
>
> I haven't added any of this to the spec, mostly because it's not clear to
> me that there is consensus amongst browser vendors that this is a problem
> they want to solve, let alone how to solve it.
>
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>
Received on Monday, 24 September 2012 07:13:15 UTC