- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Thu, 28 Sep 2006 19:39:06 +0200
- To: www-archive@w3.org
Hi,
The following is a list of issues I considered when writing draft-
hoehrmann-urlencoded-00.txt and how I resolved them; the list is meant
to aid review of the document. Let me know if you have any opinion on
these issues. The current draft is available at [1]
* Why not standardize application/x-www-form-urlencoded instead?
There is no one such format, specifications and implementations
vary in how they handle character encodings, escaped characters,
which characters are used as separator (some technologies allow
to choose virtually any character), whether the media type can
have a 'charset' parameter, how they handle encoded data sets
that the RFC 1866 algorithm would never produce (e.g., foo&bar,
foo=bar=baz).
Further, the media type application/x-www-form-urlencoded cannot
be registered under the rules of RFC 4288, and updating RFC 4288
to make an exception for this type would likely be difficult and
would set bad precedent. Given these problems, there is not much
that could reasonably be standardized without contradicting de-
ployed infrastructure.
* Okay, but the new format is not substantially better, so it won't
be implemented anyway.
The Introduction of the document lists several key benefits of the
format, for example, common characters like ":" and "/" are not
escaped, so you get "url=http://example.org/" instead of the much
less readable "url=http%3A%2F%2Fexample.org%2F" that legacy im-
plementations would produce, or if most of the data contains e.g.
japanese text, and the data is POSTed to some web service, then
the japanese characters would be encoded as 3 bytes, not 9 bytes
each, which reduces e.g. transport packet fragmentation in Ajax
applications that constantly transfer only small amounts of data.
Also note that unlike application/x-www-form-urlencoded the new
format supports the notion of undefined values which allows for
shorter and more natural representation of certain data sets; the
draft gives the following example: For instance, a data set used
to control columns in product lists could look as follows. A more
conventional way to encode the same information would be, e.g.,
"c1=img&c2=avail&c3=name&c4=price" -- with undefined values this
could be written as "img;avail;name;price".
* Should ' be in the set of escaped characters?
This is relevant to copy and paste operations and in some environ-
ments URL extraction; for example, given <a href='...'>...</a> and
http://example.org/search.p6?q=Teal'c the resource identifier can
not be simply copied and pasted into the attribute value, the '
character has to be escaped. Likewise, if the URL occurs in some
unstructured text, like an IRC chat, a text/plain mail, or similar,
some tools might consider the URL to end with the '. In these cases
it would make sense to escape the "'". On the other hand, writing
http://example.org/search.p6?q=Teal%27c would make the data set
less readable. The current draft does not escape it; comments on
this issue are very welcome.
* Shouldn't there be different encoding algorithms for query strings
and normal POSTed data?
This is relevant e.g. when SPARQL queries are transmitted using
POSTed application/www-form-urlencoded data sets. For example, the
query might be
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?book ?who
WHERE { ?book dc:creator ?who }
and in the SPARQL protocol this would map to query=...query...
The decoding algorithm defined in the specification could handle
this case just fine if it were
Content-Type: application/www-form-urlencoded
query=PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?book ?who
WHERE { ?book dc:creator ?who }
but the encoding algorithm would encode it as
Content-Type: application/www-form-urlencoded
query=PREFIX+dc:+%3Chttp://purl.org/dc/elements/1.1/%3E+%0A+++
+++SELECT+?book+?who+%0A++++++WHERE+%7B+?book+dc:creator+?who+%7D
and as such implementations would be non-compliant if they produce
the former. Should the specification allow construction of the
former when creating stand-alone entities? The current draft does
not.
* Should it be possible to encode (("", undefined))?
The draft does not allow encoding of a data set with a single item
where the name is the empty string and the value is undefined. It
would encode to the empty string, which already represents a data
set with no value. It would be possible to reserve a character or
string to represent this data set, which would then have to be es-
caped when it is used as data.
* I'm using XForms, so I can't use this format anyway.
Instead of internet media types XForms uses QNames to specify the
serialization format, and the XForms specification currently lacks
such an identifier for application/www-form-urlencoded; it would
be possible to define XForms extensions to identify this format,
and XForms implementations are free to do so. The draft does not
define such a QName as it is expected that future versions of the
XForms specification specify, for example, ietf-urlencoded-post
and ietf-urlencoded-get, in which case such a definition in the
application/www-form-urlencoded specification would be redundant.
* The Compatibility considerations are terrible!
I know! Please propose something better, or make suggestions how
to improve the current text.
[1] http://ietfreport.isoc.org/idref/draft-hoehrmann-urlencoded/
regards,
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Thursday, 28 September 2006 17:39:28 UTC