Re: data URIs - filename and content-disposition from Michael A. Puls II on 2012-06-15 (uri@w3.org from June 2012)

From: Michael A. Puls II <shadow2531@gmail.com>
Date: Fri, 15 Jun 2012 02:28:52 -0400
To: uri@w3.org
Message-ID: <4FDAD624.9080706@gmail.com>
On 3/1/2012 12:48 PM, Alessandro Angeli wrote:
> However, as far as I can tell, it is possible to achieve an even more
> generic and flexible result than what would be accomplished by the
> HEADERS param in a completely standard-compliant way by using the
> message/* MEDIATYPE, so that the payload (DATA part) of the "data:" URI
> would be a complete message/*, including its header fields.
>
> For example, using message/http, one would have (all in one line):
>
> {data:message/http,HTTP 200 OK|Content-Type:text/plain;charset=utf-8
> |Content-Disposition:attachment;filename=%22hello world.txt%22||HELLO
> WORLD}
>
> I used {} to delimit the URI and I used spaces and | for readability,
> but they are supposed to be escaped as %20 and %0D%0A (that is, I used |
> to represent a new line). I also used unescaped reserved chars because
> of the consideration at the end of this message.
>
> Using message/rfc822:
>
> {data:message/rfc822,Content-Type:text/plain;charset=utf-8
> |Content-Disposition:attachment;filename=%22hello world.txt%22||HELLO
> WORLD}

Both of those sound cool. I like the latter better as there's no need to 
specify http stuff. The message/rfc822 format is fine too.

"message/rfc822" for the mime type in the data URI though is not. I 
might want to create a data URI for a real message/rfc822 file and I 
wouldn't want that being interpreted as something that has an embedded 
attachment that the browser needs to extract.

For example, Opera can already render the content of 
<data:message/rfc822,Content-Type%3A%20text%2Fplain%3B%20charset%3D%22utf-8%22%3B%20name%3D%22test.txt%22%0D%0AContent-Disposition%3A%20attachment%3B%20filename%3D%22test.txt%22%0D%0AContent-Transfer-Encoding%3A%208bit%0D%0A%0D%0Atest%0D%0A> 
as an email message in a browser tab.

So, the mime type needs to be something not currently used. message/http 
or whatever.

> The benefits over the HEADERS param would be:
>
> 1) no need to define a new param
>
> 2) more flexible (you can even specify the HTTP response line)

I personally can't think of use for that or need that.

> 3) to implement it, I believe it should be possible to simply unescape
> the whole payload and pipe the result as an octet-stream into the
> browser's HTTP response handler (if using message/http; if using
> message/rfc822, a fake response line could be prefixed to the payload to
> turn it into a message/http)

Indeed. That sounds like that'd be the case.

> 4) base64 encoding can be specified for the whole payload or only for
> the message/* body, using the usual Content-Transfer-Encoding header
> field

Yeh, if the data URI content is base64-encoded in the URI, the 
person/thing creating the attachment's content in the message/ format 
might want to use a Content-Transfer-Encoding of 8bit (for example) to 
save space instead of base64 so that the file data isn't base64-encoded 
twice.

Might even be cool to support Content-Transfer-Encoding of "Binary" for 
the format too for binary attachments. Opera can already handle that. 
Load <http://shadow2531.com/opera/testcases/mht/000.mht> in Opera. See 
the source of it (wget it for example) to see the binary png data for 
the attachment (it breaks Opera's view source often).

> 5) it is possible to use quoted-printable, which may be more compact
> (after all, "=" does not need to be URI-escaped)

Indeed.

> 6) it is even possible to use gzip compression, which may mitigate the
> bloating caused by base64

Indeed. I think browser already supports all kinds of encodings.

One thing with the header format though. Although you could can specify 
multiple attachments in the message/rfc822 (or eml-like) format (with 
multipart/mixed and then each attachment section), the format for use 
with data URIs should be limited to a single attachment.

-- 
Michael
Received on Friday, 15 June 2012 06:29:21 UTC