Re: [whatwg] Priority between <a download> and content-disposition

On 2013-03-18 13:50, Bjoern Hoehrmann wrote:
> * Jonas Sicking wrote:
>> It's currently unclear what to do if a page contains markup like <a
>> href="page.txt" download="A.txt"> if the resource at audio.wav
>> responds with either
>>
>> 1) Content-Disposition: inline
>> 2) Content-Disposition: inline; filename="B.txt"
>> 3) Content-Disposition: attachment; filename="B.txt"
>>
>> People generally seem to have a harder time with getting header data
>> right, than getting markup right, and so I think that in all cases we
>> should display the "save as" dialog (or display equivalent download
>> UI) and suggest the filename "A.txt".
> You mention `audio.wav` but that is not part of your example. Also note
> that there are all manners of other things web browsers need to take in-
> to account when deciding on download file names, you might not want to
> e.g. suggest using "desktop.ini", "autorun.inf" or "prn" to the user.
>
> That aside, it seems clear to me that when the linking context says to
> download, then that is what a browser should do, much as it would when
> the user manually selects a "download" context menu option. In contrast,
> when the server says filename="example.xpi" then the browser should pick
> that name instead of allowing overrides like
>
>    <a href='example.xpi' download='example.zip' ...>...
>
> which would cause a lot of headache, especially from third parties. And
> allowing such overrides in "same-origin" scenarios seems useless and is
> asking for trouble ("download filenames broken after moving to CDN").

The expected behavior from <a href='example.xpi' download='example.zip' 
...> is that it is a download "hint"
A UI of some sorts should appear where the user has the option to 
download. (for example a requester with Run Now and Save As or Print or 
Share or Email and similar).
download="" attribute is  just a browser hint, a user (and thus the 
browser) can (and should be able to) override this behavior if desired 
(in options somewhere, maybe under a Applications tab?)

If the server provided file-type matches that of the href (i.e. they are 
both .xpi), or are identical then the download attribute filename "hint" 
should be the default.

If the server provided a file-type that conflict with the href then the 
browser need to use some logic to figure out which of the three to display.
If the server provided a filename is different than the href then the 
browser need to use some logic to figure out which of the three to display.
If download attribute has a full or relative url then href (or server) 
should be used instead.

What is the best logic to use?
Both href and download are put there by either the author of the page or 
some automated system (forum/blog software/CDN/who knows...)
href and download in that respect should be equally trusted (or is it 
distrusted?)
What the server says always trumps href and download, and href (or 
server) always trumps download if href and server match in file-type.
The only exception is situations where the content is generated in some way.
<a href="example.php?type=csv" download="report1.csv">Download Report 1 
as CSV</a>
<a href="example.php?type=xml" download="report1.xml">Download Report 1 
as XML</a>

Now the server might categorize it as "text/html", I've seen this by 
mistake on servers before (not properly configured),
or the script did not set the proper content type when creating the headers.
So in this case the download hint is very helpful.

How many web scripts "extensions" out there is there? .php .asp .cgi .py 
.???
What about this then?
<a href="example.com/reports/1/?type=xml" 
download="report1.xml">Download Report 1 as XML</a>
and with the server type "text/html" by mistake, how to handle that 
then? Whom to "trust"?
The server may (or may not) redirect to a url of 
example.com/reports/1/index.php?type=xml or 
example.com/reports.php?id=1&type=xml
it may also simply remain example.com/reports/1/?type=xml
Or what if it is <a href="example.com/reports/1/xml/" 
download="report1.xml">Download Report 1 as XML</a>

A URL is simply a way to point to some content, what to do with it is up 
to the browser and the user.
One would hope the server serves it as the right type but this is not 
always true.
The page author may not even have control or the ability to add 
filetypes to the server configuration. (webhotels for example)
The download attribute indicate the authors desired behavior for 
clicking the link.

So let's break it down (from a more or less browser's point of view):

1. The user clicks the link, there is a download attribute so we will 
show a dialog with a Save As (and possibly other alternatives, dependent 
on browser and OS features and user options).
2. If there is no download attribute/no filename hint, then use href and 
try to make a user friendly filename out of that.
3. Listen to what the server says (in the HTTP header), does it say it 
is  a .xml ? If yes then that is good, if not then treat it as if it was 
binary for the moment.
4. Make sure the text displayed is along the line of: Download 
"https://example.com/reports/1/xml/" as "report1.xml" ?
5. When download has started do a quick "magic id" check, is it a exe, 
is it something else, is it a xml as we expected?
6. If it seems to be what it is supposed to be, start downloading to 
whatever location (or alternative behavior) that the user chose.

Please note step 3 and 4, the header (whether it was a GET, POST or 
HEAD), must be parsed before the "download" dialog is shown, but 
downloading the rest of the file must wait (for the user's choice).
If if header and href attrib. and download attrib. conflict, or maybe 
just for safety reasons, the start of the file may be downloaded. Back 
when I did maintained Marshal (FileID software/lib on the Amiga, way 
back when) I found that with few exceptions the first 4KB is enough to 
scan for magic markers. And for executables it's just a few hundred 
bytes on pretty much any platform (if even that much).
What should be shown if there is an issue/conflict?

Maybe:
Download "https://example.com/reports/1/xml/" as "report1.xml" ?
WARNING! File identified as actually being an executable! (*.exe)

Or:
Download "https://example.com/reports/1/xml/" as "report1.xml" ?
NOTE! File identified as not being a xml, appears to be text. (*.txt)


In addition maybe a "Help" where the browser provides more info on what 
to do in this situation.
For example the second example is pretty harmless, most likely the 
server provided a plain text type when it should have provided xml.
The first example is scarier as a xml (as far as I know) would never 
ever start with something resembling a Windows executable for example,
so either this is corrupted data, or it is actually a exe pretending to 
be a xml.

Here's another example (no download attr, and server miss-reported the 
file type):
Download "https://example.com/reports/1/download/" as "file.zip" ?
NOTE! File identified as being a zip archive. (*.zip)

Please note that file.zip is a "default" (or generated) filename used in 
this case since a proper one is missing,
but also note that that in the case of the exe example it remained with 
the "safer" extension (instead of offering to save it as report1.exe).
Also, in the case of a exe warning like that (please note that various 
scripts like .bat or .js could also be a issue)
it may be smart to have the browser pass such a exe along to a virus 
checker (if supported) for an extra thorough scan.


I probably missed other scenarios here, or should have explain this 
better, but I'm just brainstorming as I'm writing this, so I apologize 
for that.

The key though is showing: Download "url" as "file.ext" ?
And in cases where a quick file header scan reveals a possible issue (or 
simply wrong fileformat extension) either a notice or warning text in 
addition.
But this is only if the user actually hose "Save As" in the download 
dialog, they might have chosen "Share on facebook" or "Print" or "Email 
to..." or even "Open"
a similar but different dialog would obviously be needed in that case.

Although I can't recall hearing of a "Printer virus" yet, but still,  
who knows...*shrug* (that was a semi-serious joke)




-- 
Roger "Rescator" Hågensen.
Freelancer - http://www.EmSai.net/

Received on Tuesday, 19 March 2013 13:31:43 UTC