- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Thu, 28 Sep 2006 19:39:06 +0200
- To: www-archive@w3.org
Hi, The following is a list of issues I considered when writing draft- hoehrmann-urlencoded-00.txt and how I resolved them; the list is meant to aid review of the document. Let me know if you have any opinion on these issues. The current draft is available at [1] * Why not standardize application/x-www-form-urlencoded instead? There is no one such format, specifications and implementations vary in how they handle character encodings, escaped characters, which characters are used as separator (some technologies allow to choose virtually any character), whether the media type can have a 'charset' parameter, how they handle encoded data sets that the RFC 1866 algorithm would never produce (e.g., foo&bar, foo=bar=baz). Further, the media type application/x-www-form-urlencoded cannot be registered under the rules of RFC 4288, and updating RFC 4288 to make an exception for this type would likely be difficult and would set bad precedent. Given these problems, there is not much that could reasonably be standardized without contradicting de- ployed infrastructure. * Okay, but the new format is not substantially better, so it won't be implemented anyway. The Introduction of the document lists several key benefits of the format, for example, common characters like ":" and "/" are not escaped, so you get "url=http://example.org/" instead of the much less readable "url=http%3A%2F%2Fexample.org%2F" that legacy im- plementations would produce, or if most of the data contains e.g. japanese text, and the data is POSTed to some web service, then the japanese characters would be encoded as 3 bytes, not 9 bytes each, which reduces e.g. transport packet fragmentation in Ajax applications that constantly transfer only small amounts of data. Also note that unlike application/x-www-form-urlencoded the new format supports the notion of undefined values which allows for shorter and more natural representation of certain data sets; the draft gives the following example: For instance, a data set used to control columns in product lists could look as follows. A more conventional way to encode the same information would be, e.g., "c1=img&c2=avail&c3=name&c4=price" -- with undefined values this could be written as "img;avail;name;price". * Should ' be in the set of escaped characters? This is relevant to copy and paste operations and in some environ- ments URL extraction; for example, given <a href='...'>...</a> and http://example.org/search.p6?q=Teal'c the resource identifier can not be simply copied and pasted into the attribute value, the ' character has to be escaped. Likewise, if the URL occurs in some unstructured text, like an IRC chat, a text/plain mail, or similar, some tools might consider the URL to end with the '. In these cases it would make sense to escape the "'". On the other hand, writing http://example.org/search.p6?q=Teal%27c would make the data set less readable. The current draft does not escape it; comments on this issue are very welcome. * Shouldn't there be different encoding algorithms for query strings and normal POSTed data? This is relevant e.g. when SPARQL queries are transmitted using POSTed application/www-form-urlencoded data sets. For example, the query might be PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?book ?who WHERE { ?book dc:creator ?who } and in the SPARQL protocol this would map to query=...query... The decoding algorithm defined in the specification could handle this case just fine if it were Content-Type: application/www-form-urlencoded query=PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?book ?who WHERE { ?book dc:creator ?who } but the encoding algorithm would encode it as Content-Type: application/www-form-urlencoded query=PREFIX+dc:+%3Chttp://purl.org/dc/elements/1.1/%3E+%0A+++ +++SELECT+?book+?who+%0A++++++WHERE+%7B+?book+dc:creator+?who+%7D and as such implementations would be non-compliant if they produce the former. Should the specification allow construction of the former when creating stand-alone entities? The current draft does not. * Should it be possible to encode (("", undefined))? The draft does not allow encoding of a data set with a single item where the name is the empty string and the value is undefined. It would encode to the empty string, which already represents a data set with no value. It would be possible to reserve a character or string to represent this data set, which would then have to be es- caped when it is used as data. * I'm using XForms, so I can't use this format anyway. Instead of internet media types XForms uses QNames to specify the serialization format, and the XForms specification currently lacks such an identifier for application/www-form-urlencoded; it would be possible to define XForms extensions to identify this format, and XForms implementations are free to do so. The draft does not define such a QName as it is expected that future versions of the XForms specification specify, for example, ietf-urlencoded-post and ietf-urlencoded-get, in which case such a definition in the application/www-form-urlencoded specification would be redundant. * The Compatibility considerations are terrible! I know! Please propose something better, or make suggestions how to improve the current text. [1] http://ietfreport.isoc.org/idref/draft-hoehrmann-urlencoded/ regards, -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Thursday, 28 September 2006 17:39:28 UTC