- From: Ian Hickson <ian@hixie.ch>
- Date: Fri, 22 Jul 2011 06:58:44 +0000 (UTC)
On Sat, 9 Apr 2011, Glenn Maynard wrote: > > > > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-July/027455.html > > A big +1 to the proposal in this thread, to allow specifying > Content-Disposition behavior in anchors. I believe I last responded to feedback on this topic in August last year: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-August/028148.html ...with a minor addition in December: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-December/029350.html However, feedback since has introduced a new wrinkle that I do not believe was thoroughly examined in previous threads on the topic: the issue of how to specify an explicit filename, which is especially relevant for a number of use cases mentioned below. > <a download=filename.txt> would have the effect of adding (or > overriding) the header "Content-Disposition: attachment; > filename=filename.txt". > > It would mean I'd no longer need to use server-side hacks to cause > Content-Disposition to be sent for download links, eg. where > "?download=1" adds the C-D header. > > I also just now had to implement a server-side script that receives > base64 file data and a filename in parameters, and responds by echoing > it back. That's an ugly hack to allow client-side data to be saved to > disk, and doesn't work with serverless web apps. This would be fixed, > allowing both data: URLs and File API object URLs as download links. This download="" attribute seems like a reasonable idea. On Sun, 10 Apr 2011, Bjartur Thorlacius wrote: > > Right. As an end-user I ask: Does a web developer publishing links to > resources have a say as to whether I render aforementioned resource > immediately, write it to disk or both? It should always be up to the user to have the final say, but the use cases presented here suggest that it makes sense to at least give authors the opportunity to provide a hint as to what the default behaviour should be for links. > Better yet, File API could have an API for writing blobs to files. That would help for some of these cases, but not most. It would also mean that what could be expressible purely declaratively requires script, which is something we generally try to avoid. Declarative semantics are easier to process using static analysis tools, for instance. On Sun, 10 Apr 2011, Glenn Maynard wrote: > > (Browsers generally don't have a "show this file in the browser, even > though it's C-D: attachment" option on that dialog--they should, but > that's a separate issue.) Indeed, that should probably be addressed as well. On Thu, 26 May 2011, Dennis Joachimsthaler wrote: > > The filename is only necessary when you feed the file from a dynamic > page, like directly from the PHP processor. And in this case you can > directly use the contend-disposition HTTP header. That prevents a single resource from being offered either for view or download, unfortunately (users should be able to use context menus, but in practice few users do). > You have files in a folder that are numbered in one continous numbering > scheme. The files are heavily downloaded so server side scripting falls > out of the question because sending files through this is, to say the > least, slow, unless you use some special tricks. > > Instead of giving the user a link to the file called "A342378437.pdf" > you can use the disposition attribute to > > a) Let him directly download it. He doesn't have to go the long way > around by right clicking this way. > > b) Give it a meaningful name that the user will appreciate Indeed. On Fri, 3 Jun 2011, Bjartur Thorlacius wrote: > > Use the last non-empty path component for a short name prone to > accidental clashes, or the title for a verbose, unportable and > descriptive name. It's purely a hint for user convenience (so they don't > have to invent their own names or retype the title). What a file is > named on a client's machine is purely the client's matter. Using the title="" doesn't really work because what is appropriate for a tooltip and what is appropriate for a filename might not match. On Thu, 14 Jul 2011, Ian Fette wrote: > > Many websites wish to offer a file for download, even though it could > potentially be viewed inline (take images, PDFs, or word documents as an > example). Traditionally the only way to achieve this is to set a > content-disposition header. *However, sometimes it is not possible for > the page author to have control over the response headers sent by the > server.* (A related example is offline apps, which may wish to provide > the user with a way to "download" a file stored locally using the > filesystem API but again can't set any headers.) It would be nice to > provide the page author with a client side mechanism to trigger a > download. > > After mulling this over with some application developers who are trying > to use this functionality, it seems like adding a "rel" attribute to the > <a> tag would be a straightforward, minimally invasive way to address > this use case. <a rel=attachment href=blah.pdf> would indicate that the > browser should treat this link as if the response came with a > content-disposition: attachment header, and offer to download/save the > file for the user. On Thu, 14 Jul 2011, Tantek ?~Gelik wrote: > > rel="enclosure" is sufficient for today's use cases because authors > simply name the file accordingly on their server and then > implementations simply use the last segment of the URL as the filename - > presto 80/20 case solved (and solved 6 years ago with no modification > needed to HTML for it to be valid). > > Having to specify a "download" attribute that reflects a filename > different from the last segment of the URL is the minority case, but > still sufficient to justify addition of the attribute. We can just say that if the attribute has no value it is indicating that the author recommends downloading the file. No need for redundancy. On Fri, 15 Jul 2011, Alexey Proskuryakov wrote: > > What meaning will this attribute have on a platform that simply doesn't > expose the notion of a file? None, presumably the same as "Content-Disposition: attachment" in the same case. > I think that this attribute could be quite confusing, and it will likely > become more confusing with time, as more platforms arise that have > creative ways of presenting data to users. Could you elaborate on what confusion you are expecting here? > It also doesn't naturally help understanding that it's just poor man's > Content-Disposition:attachment. From this point of view, I like Ian's > original proposal (rel=attachment) more. Unfortunately, not being able to provide a file name makes it inadequate for a number of use cases people have raised. On Thu, 14 Jul 2011, Glenn Maynard wrote: > > That reminds me of something download=filename can't do: assign a > filename while leaving it inline, so "save as" and other operations can > have a specified filename. That would require two separate properties. > One case I've come across is <img>, where I want to display an image, > but provide a different filename for save-as. Separating the filename > would allow this to be applied generically both links and inline > resources: <img src=f1d2d2f924e986ac86fdf7b36c94bcdf32beec15.jpg > filename=picture.jpg>. I haven't addressed this use case here, but it's definitely something we can investigate in the future if download="" proves successful. (Note that for same-origin cases, you can use Content-Disposition for this already. This would only really help with embedding images from other sites that don't support giving a filename, e.g. Flickr.) On Fri, 15 Jul 2011, Glenn Maynard wrote: > > Bear in mind that "optimize for" doesn't mean "support at all"; if > download=filename is used, it seems unlikely that there will ever be > *any* client-side way to supply the filename without implying > attachment, which is a very different thing than "not optimizing for > it". > > I don't feel strongly enough about this to press it further, but <a > href=ugly download filename=pretty> also seems fairly clean, and avoids > combining parameters that really are orthogonal to one another. On Sat, 16 Jul 2011, Tantek ?~Gelik wrote: > > Agreed with Glenn, narrowing the semantic solves this problem neatly: > > * filename="" attribute - what to name the file if saved by the user (by whatever means) > * existing rel="enclosure" spec - download the link when clicked/activated. > > So the author can choose to do one, or the other, or both. Clean, > simple, orthogonal. On Fri, 15 Jul 2011, Ian Fette wrote: > > I really don't see the importance of the "name the thing that isn't > going to be downloaded" usecase; there are countless edge cases that we > could concern ourselves with in HTML but that few users will ever hit, > this is one. (I also suspect a user sophisticated enough to actually > save something, e.g. right click save as, is sophisticated enough to be > able to type their own filename.)I think it's better overall to keep the > semantics as clean and simple as possible. I suggest we move forward > with <a href=blah download=filename> with the origin considerations > mentioned in the previous email and move on. On Sun, 17 Jul 2011, Glenn Maynard wrote: > > A common case is generated PDFs, which are regularly both saved to disk > and viewed in-browser (eg. tax forms which are viewed to print and then > saved for records). This is easily solved by providing two links, or an iframe and a link. This is probably more usable than one link and expecting the user to right-click, too. As Ian says above, if the user is savvy enough to right-click, the user is likely not going to find it difficult to give the file a name either. On Thu, 14 Jul 2011, Karl Dubost wrote: > > A random thought just occured to me (maybe dumb) > But is it a relation qualifier or in fact a target? > > [...] what about adding > > <a href="foo.pdf" target="_download">Save a Tree, Eat a beaver</a> On Thu, 14 Jul 2011, Bjartur Thorlacius wrote: > > This seems like the best solution to me. A filename hint has two use > cases: a suggestion for a local identifier, and providing a filename > extension for systems that use them to identify file types with > incomplete or nonexistent /etc/mime.type media type mappings. I'll only > name so many pictures "pic.jpg", so I suggest using the descriptive (and > thus verbose) value of the title attribute. The worst problem will be > encoding the name on filesystems such as FAT. > > <a href="//samplecdn.example/pix/2011/7/14/party/cake" > title="S??kkula??ikaka me?? ??s" target="_download">Afm??liskaka > m??n</a> On Fri, 15 Jul 2011, Jonas Sicking wrote: > 3) The target=_download idea is interesting, but I'm not sure we can > safely introduce new target values, and this also suffers from not > providing a way to specify the downloaded filename. Indeed. On Sat, 30 Apr 2011, Michal Zalewski wrote: > > Downloading files in general is a very problematic area, because there's > a very fragile transition between HTTP MIME type and filesystem > extension or other OS-level content determination mechanism. Many > browsers either don't try to do anything useful to prevent weird > "promotions" from safe to unsafe document types; or enforce decidedly > imperfect logic. Allowing attackers to further control this process has > some risks. > > [ This is further compounded by the fact that in many cases, it is safer > for users to open certain document types, HTML included, from http: URLs > than from file:. ] On Sat, 30 Apr 2011, Michal Zalewski wrote: > > My concern is a bit more straightforward. To use a practical example: > just because a social networking site allows nearly arbitrary JPEG files > to be uploaded and served as profile pictures (Content-Type: image/jpeg) > does not mean that the applications wants users to be offered that > content as a download named Security_Update.exe, supposedly coming from > that trusted site. Well again, making sure you don't put an extension on a file that does not correspond to the type of the resource as reported by the third-party site seems like an elementary precaution. On Thu, 26 May 2011, Boris Zbarsky wrote: > > > > So what does Firefox do in this case? > > I believe it forces the extension to match the MIME type; if the type > text/plain the saved filename will be "Important_Security_Update.exe.txt". That seems like a rather desireable property. On Thu, 26 May 2011, Boris Zbarsky wrote: > On 5/26/11 3:12 PM, Dennis Joachimsthaler wrote: > > Oh I see the problem... Is it the bang? #!/bin/perl #!/bin/python > > #!/bin/bash could very well result in the text file being executed in > > one of those interpreters, right? > > Yes, but even worse on some systems a .pl file will just handed over to > the registered handler for those (often a Perl interpreter) if you try > to "open" it (which is a different operation from "execute" and can be > done even on files that are not executable; think double-clicking the > file in a file manager). The disparity of behaviours amongst file managers on Unix systems is indeed problematic. I'm not sure what to suggest there. On Fri, 3 Jun 2011, Eduard Pascual wrote: > > My post was entirely about the precedence between the two sources of the > header, when they conflict. I think is obvious enough that the provider > of a resource should be given more weight than a third party referencing > to it. Either of the sides can still leave things to whatever default > could apply to each case if they don't care; but if both care, and they > conflict, the provider of the resource should have the final say over > whatever the third party may be requesting. I don't think it's obvious. In particular, there is a big difference between how a link with a download="" attribute could be processed compared to how a regular hyperlink could be processed: in the latter case, the UA only knows it's going to be a download some way into processing the navigation request (and after some irreversible steps like aborting other downloads); in the former case, we can bypass all that and just do a direct download without navigating to the resource. A number of people raised issues relating to download="" confusing the issue of which origin to trust: On Sat, 30 Apr 2011, Michal Zalewski wrote: > > Note that somewhat counterintuitively, there would be some security > concerns with markup-level content disposition controls (or any JS > equivalent). For example, consider evil.com doing this: > > <a href='http://example.com/user_content/harmless_text_file.txt' > disposition='attachment; filename="Important_Security_Update.exe"'> On Sat, 30 Apr 2011, Glenn Maynard wrote: > > To do some contriving, in trying to follow the example: if example.com > is a site trusted by the user or administrator, it may be flagged in the > browser as "always allow saving sensitive file types from this site". > If you can override the C-D header remotely, and if there exists (for > example) a text file whose contents happen to alias to a dangerous > executable, then you could cause a dangerous executable to be saved to > disk. Browsers might need a mechanism to remember whether the effective > Content-Disposition header is "trusted" (received from the response, or > overridden from the same origin) or not, which is sort of annoying. On Thu, 2 Jun 2011, Glenn Maynard wrote: > > I don't think the issue raised was about getting people to save files, > though. If you can get someone to click a link, you can already point > them at something that sets the HTTP C-D header. > > As I recall, the concern was about getting people to do this on files > that appear to be from a trusted domain. That is, evil.com linking to a > perl script on trusted.com (or, say, a dual-mode image/ELF file), > setting C-D in the link to get it to save-as, perhaps hoping that people > will see "from: http://trusted.com" in the save-as dialog. (I doubt > that most users look at that at all; Chrome doesn't even seem to bother > displaying it.) > > At worst, it just seems like a minor UI design issue. On Thu, 2 Jun 2011, Michal Zalewski wrote: > > The origin of a download is one of the best / most important indicators > people have right now (which, by itself, is a bit of a shame). I just > think it would be a substantial regression to make it possible for > microsoft.com or google.com to unwittingly serve .exe / .jar / .zip / > .rar files based on third-party markup. > > Firefox and MSIE display the origin fairly prominently, IIRC; Chrome > displays it in some views. But deficiencies of current UIs are probably > a separate problem. On Thu, 2 Jun 2011, Glenn Maynard wrote: > > Firefox displays it in a small, unimportant-looking piece of text inside > a busy dialog; I never even consciously noticed it until I looked for > it. For me, Chrome doesn't say anything; when I click an .EXE it saves > it to disk without asking (maybe I changed a preference somewhere--that > seems like an unlikely default). > > When I download a file, I decide whether to trust "dangerous" file types > based on who's telling me to download it--that is, based on the site > linking the file, not the site hosting it. I'd strongly suspect that > more people look at who's linking the file (eg. where they were when > they clicked the link), and that very few people examine the "from:" > text in the save-as dialog. > > Either way, again this is something that can be dealt with in UI, for > example by displaying the source URL as the source of the download > rather than or in addition to the domain hosting the file when this > attribute is used. It's a weak argument against this feature. On Fri, 15 Jul 2011, Jonas Sicking wrote: > > One concern which was brought up was the ability to cause the user to > download a file from a third party site. I.e. this would allow evil.com > to trick the user into downloading an email from the users webmail, or > download a page from their bank which contains all their banking > information. It might be easier to then trick the user into re-uploading > the saved file to evil.com since from a user's perspective, it looked > like the file came from evil.com > > Another possible attack goes something like: > 1. evil.com tricks the user into downloading sensitive data from bank.com > 2. evil.com then asks the user to download a html from evil.com and > open the newly downloaded file > 3. the html file contains script which reads the contents from the > file downloaded from bank.com and sends it back to evil.com > > Step 1 and 2 require the user to answer "yes" to a dialog displayed by > the browser. However it's well known that users very often hit > whichever button they suspect will make the dialog go away, rather > than actually read the contents of the dialog. > Step 3 again requires the user to answer "yes" to a dialog displayed > by the browser in at least some browsers. Same caveat applies though. > > One very simple remedy to this would be to require CORS opt-in for > cross-site downloads. For same-site downloads no special opt-in would > be required of course. > > It's also possible that it would be ok to do this without any opt-ins > since there are a good number of actions that the user has to take in > all these scenarios. Definitely something that I'd be ok with > discussing with our security team. > > Tentatively I would feel safer with the CORS option though. And again, > for same-site downloads this isn't a problem at all, but I suspect > that in many cases the file to be downloaded is hosted on a separate > server. On Sun, 17 Jul 2011, Adam Barth wrote: > 2011/7/15 Jonas Sicking <jonas at sicking.cc>: > > > > One concern which was brought up was the ability to cause the user to > > download a file from a third party site. I.e. this would allow > > evil.com to trick the user into downloading an email from the users > > webmail, or download a page from their bank which contains all their > > banking information. It might be easier to then trick the user into > > re-uploading the saved file to evil.com since from a user's > > perspective, it looked like the file came from evil.com > > It seems like the solution to that problem is to be clear about where > the download is coming from. Being clear about where downloads come > from is important in many scenarios, beyond just this setting. > > > Another possible attack goes something like: > > 1. evil.com tricks the user into downloading sensitive data from bank.com > > 2. evil.com then asks the user to download a html from evil.com and > > open the newly downloaded file > > Most browsers treat downloaded HTML files as "dangerous downloads," > which means they get similar UI treatment to executable downloads. For > example, on Mac OS X, HTML downloads get the same "you're about to open > a dangerous file" warning from the operating system as executable > downloads. If the attacker can convince the the user to click past > these dialogs, the attacker can convince the user to run arbitrary code > anyway, so there's nothing we can do to provide security in this > setting. > > > 3. the html file contains script which reads the contents from the > > file downloaded from bank.com and sends it back to evil.com > > This sounds like a security vulnerability in the browser. A better > security posture is to not allow downloaded content from one web site > read downloaded content from another web site, regardless of how the > content was downloaded. For example, that's the current behavior of > Chrome and Internet Explorer. Safari takes a different approach and > allows downloaded HTML content to access any file and any web site, > which means the attacker doesn't need to go through the elaborate > process you've outlined. Merely performing step (3) is sufficient to > steal all the user's banking details today, which tells me that either > Safari is already vulnerable to this attack without this new feature or > that this threat isn't actually much of a risk. > > (I know that Firefox has a policy that's somewhere in between the strong > local-file security policy used by Chrome and the weak policy used by > Safari, but Firefox's policy is too complex for me to understand. I > know that protection of downloaded files was one of the considerations > that fed into the design of Firefox's policy, but I'll leave it to > others to common on whether it provided effective protection in this > scenario.) > > > Step 1 and 2 require the user to answer "yes" to a dialog displayed by > > the browser. However it's well known that users very often hit > > whichever button they suspect will make the dialog go away, rather > > than actually read the contents of the dialog. > > In that model, the attacker can just run arbitrary code. It's not > possible to provide security in that model, regardless of this feature. > > > Step 3 again requires the user to answer "yes" to a dialog displayed > > by the browser in at least some browsers. Same caveat applies though. > > In this case, the caveats are important. As described above, Chrome is > not vulnerable to this attack and Safari is vulnerable to this attack > even without this feature. > > > One very simple remedy to this would be to require CORS opt-in for > > cross-site downloads. For same-site downloads no special opt-in would > > be required of course. > > I'm not convinced there's a problem to solve here. Wiring CORS into the > download system, by contrast, add a significant amount of complexity to > the implementation, which is costly. > > > It's also possible that it would be ok to do this without any opt-ins > > since there are a good number of actions that the user has to take in > > all these scenarios. Definitely something that I'd be ok with > > discussing with our security team. > > I'm happy to talk it over with your security folks if they disagree with > the contents of this email. (Jonas questioned some of the premises above, but not Adam's basic points which IMHO make the presented risk somewhat irrelevant as the same assumptions would lead to far bigger problems already.) > > Tentatively I would feel safer with the CORS option though. And again, > > for same-site downloads this isn't a problem at all, but I suspect > > that in many cases the file to be downloaded is hosted on a separate > > server. > > It's important to think these scenarios through carefully, but in this > case I think we're fine without CORS. In fact, using CORS here seems like it would _add_ a vulnerability: it would mean that any site that wants to allow another site to let the user download a file from that site can in addition simply read the file! Given the huge amount of concern we have shown over the possibility that authors will accidentally set CORS headers on too many pages, this sounds like a very dangerous thing to be encouraging. Adam Barth also wrote: > > There's also complexity for authors, who need to set the appropriate > headers and to worry about authenticated versus unauthenticated > requests. Worse, in some deployment scenarios, requiring CORS to use > this feature actually exposes the sites to greater risk because they > might set ACLs on these resources that are broader than needed. Here's > an example: > > 1) A webmail service wants to offer attachments for download. > 2) The webmail service is security conscious, so it renders email > messages in a sandboxed iframe in case it's email rendering contains > an XSS hole (which is quite possible, especially with rich-text, HTML > email). > 3) The webmail service wants to include links to download email > attachments using this feature. > > In order for the links to work (assuming the links appear in context in > the email, as they do in Outlook, for example), the webmail provider > needs to set CORS headers that grants the sandboxed iframe access to the > attachments. Because the sandboxed iframe runs in a unique origin (for > maximum security), the only CORS ACL that works is > "Access-Control-Allow-Origin: *". However, that ACL allows any other > web site to read the contents of the user's attachments! Now, maybe you > could work around that issue using unguessable URLs for the attachments, > but you'd still be exposing the contents of the attachments if there's > an XSS in the mail rendering code, a vulnerability that would not be > present if the downloads feature wasn't coupled to CORS. > > In summary, using CORS for this purpose is costly (both to implementors > and to authors), and I don't think it solves a real security problem. Indeed. On Fri, 15 Jul 2011, Ian Fette wrote: > > So, in the interest of making progress, what if we tried... > > download=filename > > for same origin it's always downloaded (includes filesystem api from > that origin) for cross-origin it's downloaded if we get a positive CORS > response and/or we get a content-disposition attachment for cross-origin > if we don't get positive CORS response OR content-disposition:attachment > we don't download > > We can always start conservative and broaden out. On Fri, 15 Jul 2011, Jonas Sicking wrote: > > I know that I would personally feel a lot more comfortable if the site > opted in to allowing downloads of the resource in question. But it's > quite possible that I'm overly paranoid. > > Though one thing to keep in mind is sites that explicitly state that a > resource should *not* reach the users disk. This is today often done > using "Cache-Control: no-store". Seems scary to allow such content to be > saved based on a cross-site request. On Mon, 18 Jul 2011, Jonas Sicking wrote: > > Any site that want to allow downloads without risking sharing data can > simply add "content-disposition: attached" headers. So no risk of > leaking too much data required. Is there any reason to use CORS here at all? It seems like the simpler solution would be the following: - If download="" is set, then by default trigger a download rather than a navigation action. (User can override via context menu.) - Pick a filename for the download as follows: - if the received resource has a Content-Disposition: attachment header that specifies a filename, use that. - otherwise, if the received resource has a Content-Disposition header that specifies a filename, and the resource is same-origin, use that filename. - otherwise, if the received resource is same-origin and the download="" attribute specifies a filename, use that. - otherwise, if the received resource has a Content-Disposition: attachment header and the download="" attribute specifies a filename, use the filename from the attribute. - otherwise, if the received resource is same-origin then derive a filename from the resource. - otherwise, either abort or alert the user that a file is being downloaded from a different origin and prompt for a filename. - If a mapping from the MIME type to an extension is known, but the filename doesn't have that extension, add it. This is what I've used for now (modulo some allowances for user interfaces), but I welcome suggests for changing this. One thing this doesn't handle is the case of a resource on the server not having a known type, and the download="" filename (or indeed the URL itself) having a potentially dangerous extension (like .exe). I figure this is no more dangerous than the server having the right type in this case, if the right type is a dangerous type. On Mon, 18 Jul 2011, Glenn Maynard wrote: > > If I link directly to the file to download, users should trust the file > as much as they trust *my* site, rather than Google itself, since the > download is, from their perspective, coming from me and not them. It's not quite that simple. If your site is an HTTPS site, and you link to an HTTP site, then the downloaded resource musn't be trusted as much as your site, since it could have been tampered with in-flight. > So, if a hosting service doesn't want to allow executable files, it > won't show files as executable from their own download pages, which is > what should matter as far as that site's trust is concerned. People > using this mechanism to serve executable files from external links may > be annoying, but it shouldn't cause trust issues. This implies the hosting service has to prevent pages from including download="" attributes pointing to itself, of course. On Tue, 19 Jul 2011, Alexey Proskuryakov wrote: > > The fact that hosting implies a certain degree of trust is also built > into client software. For example, if you download an executable file on > Mac OS X, then the system warns you about it on first launch, and tells > you where it was downloaded from, not where a link to the download was. Which URL is given there of course depends on what URL the browser decides to put there. It could be the page's URL. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 21 July 2011 23:58:44 UTC