Re: data URIs - filename and content-disposition from Michael A. Puls II on 2010-03-04 (uri@w3.org from March 2010)

From: Michael A. Puls II <shadow2531@gmail.com>
Date: Thu, 04 Mar 2010 18:19:49 -0500
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: uri@w3.org
Message-ID: <op.u82ge8ny1ejg13@sandra-svwliu01>
On Fri, 26 Feb 2010 08:20:39 -0500, Julian Reschke <julian.reschke@gmx.de>  
wrote:

> On 25.02.2010 17:06, Michael A. Puls II wrote:
>> ...
>> What about this?
>>
>> Say you have this file:
>>
>> "with spaces.txt"
>> ---------
>> √
>> ---------
>>
>> and want that as data URI that's treated as an attachment with a
>> filename hint of "with spaces.txt".
>>
>> Well, you might want headers like this:
>>
>> Content-Type: text/plain; charset=utf-8
>> Content-Disposition: attachment; filename="with spaces.txt"
>> Content-Language: en
>>
>> So, how bout doing it like the following?:
>>
>> data:text/plain;charset=utf-8;headers=Content-Disposition%3A%20attachment%3B%20filename%3D%22with%20spaces.txt%22%0D%0AContent-Language%3A%20en,%E2%88%9A
>>
>>
>> That way, 'text/plain;charset=utf-8' would be the full Content-Type
>> header and the rest of the headers can be specified as \r\n-separated
>> lines like in HTTP. It's tagged with "headers=" so it can be found in
>> the string easily, and the value is percent-encoded so it doesn't
>> interfere with existing UA handling of data URIs.
>>
>> The restriction would be that the headers value can't contain a
>> Content-Type header (since it's already implied. And, perhaps, it should
>> be specified exactly what headers are allowed in the headers value.
>>
>> It even works (as in, doesn't cause a problem) with ;base64 at the end
>> like this (in Opera at least):
>> ...
>
> The advantage of this is that it's flexible.
>
> The disadvantage is that it may be too flexible :-). For instance, you'd  
> either need to restrict the micro syntax for embedded headers, or  
> recipients will need to run this through a full-blown HTTP header parser  
> (which may be hard to do).
>
> So I have a slight preference to keep things simple, and to focus on the  
> specific use case.

Judging by Opera and Safari's handling of data URIs, the part between  
"data:" and "," is considered a single, percent-encoded value that is  
percent-decoded before being parsed.

For example: <data:text%2Fplain%3Bcharset%3Dutf-8%3Bbase64,4oia>

What this means is that if you have:

-----------------
Content-Type: text/plain; charset=utf-8; name*=UTF-8''%E2%88%9A.txt
Content-Disposition: attachment; filenane*=UTF-8''%E2%88%9A.txt
Content-Transfer-Encoding: base64

4oia
-----------------

, you could represent the headers as a single header named "data" like so:

data: text/plain; charset=utf-8; base64; filename*=UTF-8''%E2%88%9A.txt;  
content-disposition=attachment

Then, to make a URI out of it, you percent-encode the whole header value  
like this:

<data:%20text%2Fplain%3B%20charset%3Dutf-8%3B%20base64%3B%20filename*%3DUTF-8''%25E2%2588%259A.txt%3B%20content-disposition%3Dattachment,4oia>

, which you will see works in Opera and will work in Safari *if* they fix  
the code to trim white-space around base64. In fact, here's the trimmed  
version for Safari:

<data:%20text%2Fplain%3B%20charset%3Dutf-8%3Bbase64%3B%20filename*%3DUTF-8''%25E2%2588%259A.txt%3B%20content-disposition%3Dattachment,4oia>

So, if we go by how Opera and Safari do it, it almost looks like they  
already use a mime header parser (after percent-decoding the whole value  
between "data:" and ","). I'm not saying they do, but it kind of looks  
like it:

With that said, even if a full-blown header parser being required would  
suck for some, it'd work out great for browsers as they already have code  
to parse mime headers. And, it would allow specifying non-ascii filenames  
according to an existing spec.

The only thing to really specify then would be what param name to use for  
filename and content-disposition. Since Content-Type already has 'name',  
maybe it should be preferred over 'filename'. As for content-disposition,  
that could be as-is or just 'disposition' (as a new param for Content-Type  
even).

As for handling duplicate name/values, that could be handled as the mime  
specs say.

Given this new information, what do you think? And, what do others think?

-- 
Michael
Received on Thursday, 4 March 2010 23:20:27 UTC