- From: Alessandro Angeli <uri.w3.org@riseoftheants.com>
- Date: Thu, 01 Mar 2012 12:48:07 -0500
- To: <uri@w3.org>
I will revive this old thread with a proposal, in case this is ever going to be implemented. The original proposal is to add support for the FILENAME and CONTENT-DISPOTION params in the MEDIATYPE part of a "data:" URI. It evolved into a more generic support for a HEADERS param. The former has the benefit of simplicity, the latter of flexibility. But, at least judging from the discussion of the related proposal in the Firefox bug-tracking system (https://bugzilla.mozilla.org/show_bug.cgi?id=532230), neither is easily implemented because of both parsing and handling limitations. Moreover, both proposals require the definition of a new param (either CONTENT-DISPOTION or HEADERS) that can be applied to all MEDIATYPEs and the repurposing of the FILENAME param, which originally only applies to the Content-Disposition header field and not the MEDIATYPE. However, as far as I can tell, it is possible to achieve an even more generic and flexible result than what would be accomplished by the HEADERS param in a completely standard-compliant way by using the message/* MEDIATYPE, so that the payload (DATA part) of the "data:" URI would be a complete message/*, including its header fields. For example, using message/http, one would have (all in one line): {data:message/http,HTTP 200 OK|Content-Type:text/plain;charset=utf-8 |Content-Disposition:attachment;filename=%22hello world.txt%22||HELLO WORLD} I used {} to delimit the URI and I used spaces and | for readability, but they are supposed to be escaped as %20 and %0D%0A (that is, I used | to represent a new line). I also used unescaped reserved chars because of the consideration at the end of this message. Using message/rfc822: {data:message/rfc822,Content-Type:text/plain;charset=utf-8 |Content-Disposition:attachment;filename=%22hello world.txt%22||HELLO WORLD} The benefits over the HEADERS param would be: 1) no need to define a new param 2) more flexible (you can even specify the HTTP response line) 3) to implement it, I believe it should be possible to simply unescape the whole payload and pipe the result as an octet-stream into the browser's HTTP response handler (if using message/http; if using message/rfc822, a fake response line could be prefixed to the payload to turn it into a message/http) 4) base64 encoding can be specified for the whole payload or only for the message/* body, using the usual Content-Transfer-Encoding header field 5) it is possible to use quoted-printable, which may be more compact (after all, "=" does not need to be URI-escaped) 6) it is even possible to use gzip compression, which may mitigate the bloating caused by base64 The implementation suggested in 3) would be the full embodiment of the stated purpose of the "data:" URI, which is an inline representation of an external resource: the header metadata of an HTTP resource is part of the resource, but the current widespread usage of the "data:" URL can only represent a subset of the Content-Type header field. It should also have a performance not worse than fetching the resource externally (assuming that unescaping the payload is not slower than transferring it over a network). About the unescaped chars, RFC2397:3 claims that URLCHAR is imported from RFC2396. However, RFC2396 does not have a definition for URLCHAR. Instead, it defines the following 3 char classes (the definitions are equivalent to the ones in RFC2396:A, but rewrote in a more human-understandable way): pchar = escaped | alphanum | mark | ":" | "@" | "&" | "=" | "+" | "$" | "," uric_no_slash = pchar | ";" | "?" uric = pchar | ";" | "?" | "/" They are used in the following URI parts (again, partially rewrote and keeping only the ABSOLUTEURI form of the URI-REFERENCE): URI-reference = scheme ":" (opaque_part | hier_part) ["#" fragment] opaque_part = uric_no_slash *uric hier_part = ( ["//" authority] [abs_path] ) ["?" query] ) abs_path = "/" segment *( "/" segment ) segment = *pchar *( ";" *pchar ) query = *uric fragment = *uric I would think that a "data:" URI uses the OPAQUE_PART syntax, in which case the unescaped chars are allowed. But they would also be allowed if using the HIER_PART one (except maybe in some parts of the AUTHORITY, which is not used in "data:" anyway). -- Alessandro
Received on Thursday, 1 March 2012 17:48:20 UTC