- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 13 Sep 2012 21:58:42 +0000 (UTC)
- To: whatwg@whatwg.org
On Tue, 15 Nov 2011, Kinuko Yasuda wrote: > > Many sites have 'upload your files' feature, like for your photo images. > HTML5 allows you to do this via <input type="file" multiple> or > drag-and-drop feature, but the current solution does not provide clean > solution for cases with folders, files/folder mixed cases, or folders > with subfolders cases. > > For context, back then we have proposed (and implemented) 'directory' > attribute for <input type=file> specifically to upload a directory, but > the approach does not provide useful information to webapps about which > file comes from which folder, neither does it allow apps to control how > and when to enumerate directories (e.g. app cannot show progress meter > etc even the enumerating part takes long time). > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-April/025764.html This isn't really about directories, it's a problem with file I/O in general, made worse when there are large numbers of files -- it's just that when you have directories you're more likely to have many files. Other situations also make this difficult, e.g. if the files are on a network drive with high latency, or a removable drive such as a DVD or tape drive. Fundamentally the problem is that the objects in drag-and-drop and in <input type=file> synchronously expose all the files, and we just don't necessarily have the time to get all the files' sizes before that starts to be noticably slow. We could have the UA show progress UI, but while that could work for <input type=file>, it would be quite jarring for drag and drop. There are various ways we could fix this if we were starting afresh, but if we're trying to keep backwards compatibility there's basically no solution: the spec already requires this sync API, and pages might depend on it. So we have a problem: do we not fix the problem, do we break all pages always, break all pages but only when the user drags in a lot of files (so authors might not notice), break all pages whenever there's more than one file (so authors will notice but pages still support one file at a time), break pages only when the user drags in one or more directories? There's various ways we could fix the problem, if we're ok with breaking things. We could expose all the files in a flat list, incrementally. We could expose the directory hiearchy, with asynchronous access. If we do incremental access, there's various ways to do that: event-based notification that there's more data; an enumerator / callback mechanism; a lazy array where reading the number of files, or reading the nth file, is asynchronous... We can extend FileList and DataTransferItemList to support this, or we can add a new object that they point to, or we can just update FileList and make DataTransferItemList support the new object... In many cases, exposing the actual hierarchy can reduce the total amount of work that's needed, because many use cases don't actually need to crawl everything. For example, people gave examples of just wanting Subversion's internal .svn directories in a big tree, not the actual data; or indeed in other cases vice-versa. However, both exposing the hiearchy and flattening it have all kinds of risks. It's possible for the user to accidentally expose his entire computer's hard drive without realising it. On some systems (including at least modern Mac OS and Linux OSes, not sure about Windows), it's possible to have hard-link loops. On some systems, it's possible to drag special directories like "..", and it's not clear what that would mean. When the user drags files from multiple parts of the file system (e.g. from a Windows virtual folder), it's not clear what parts of the path we should expose -- even exposing just the common parts can expose sensitive information like the profile path if one file is in the user's profile and another is not. Also, none of these solutions helps with DataTransfer.types or exposing the types in DataTransfer.items while the drag is occurring, if the goal is to expose a deep crawl there. If we limit ourselves to just exposing the files that were dragged, then I think the OS will give us the list of files, so the problem is only statting them to get the sizes when you drop. On Tue, 15 Nov 2011, Glenn Maynard wrote: > > Entry (and subclasses) should also be supported by structured clone. > That would allow passing a DirectoryEntry received from file inputs to > be passed to a worker. This is something for later, of course, but > combined with an API to convert between Entry and EntrySync (and > DE/DESync), this would allow using the much more convenient sync API in > a worker, even if the only way to retrieve the Entry in the first place > is in the UI thread. Any spec can define how they work with the structured clone algorithm. I'll let the Filesystem API editors consider this. On Thu, 5 Apr 2012, Kinuko Yasuda wrote: > > Based on the feedbacks we got on this list we've implemented the following > API to do experiments in Chrome: > DataTransferItem.getAsEntry(in EntryCallback callback) > which takes a callback that returns FileEntry or DirectoryEntry if it's for > drop event and the item's kind is 'file'. > [later changed to be synchronous] > > We use kind=='file' in a broader definition here (i.e. a file path which > can be either regular file or directory file) and didn't add a specific > kind for directories. > > (Btw we've also implemented DataTransferItem.getAsFile(), so apps can call > either getAsFile or webkitGetAsEntry for kind=='file' item) This doesn't seem to solve the problems. It mitigates the problem of having to do a deep crawl, but it risks exposing file system loops and the other issues listed above. In any case, Opera and Mozilla have both indicated they are not interested in using the Filesystem API here, so I haven't added this to the spec. It's not clear to me how to move forward on this. My intuition is that we should assume that dragging in lots of files will not hurt due to the statted filed having been recently cached, and then expose the tree via objects, not via flattening. I don't see how to avoid exposing undetectable loops if we do this. Things like the meaning of ".." would be left to the UA, but ".." wouldn't ever be exposed as a folder name, certainly. Disjoint nodes would be treated as separate nodes in the drag, so there's no problem with exposing common paths with sensitive data, except if the user drags a sensitive path's parent (e.g. C:\). Not sure what to do with that, though. Concretely, the least invasive way to do this is probably to piggy-back on the FileList and getAsFile solutions, and make a Directory object that parallels File and provides a list of files in the directory, with either getAsDirectory() being async or, more likely, the Directory object being enumerable in an async manner to get all the files. For UAs that implement the FileSystem API, I would then recommend that the FlieSystem API provide ways to get from File and Directory objects to FileEntry and DirectoryEntry objects. I haven't added any of this to the spec, mostly because it's not clear to me that there is consensus amongst browser vendors that this is a problem they want to solve, let alone how to solve it. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 13 September 2012 21:59:13 UTC