W3C home > Mailing lists > Public > uri@w3.org > May 2002

Re: Good/Bad - URI encoding in HTML editor

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 25 May 2002 00:44:27 +0200
To: Karl Dubost <karl@w3.org>
Cc: uri@w3.org
Message-ID: <uueteuogcp859t4bc2l4jq8pqk5dv4n6so@4ax.com>
* Karl Dubost wrote:
>The question is that BBedit has a mechanism to automatically 
>translate the URIs in a document when it's inside an href.
>+ For example when you have typed
>	<a href="http://www.example.org/foo?toto=3&tata=4">A request</a>
>BBedit will convert it to
>	<a href="http://www.example.org/foo?toto=3[&amp;]tata=4">A request</a>

BBedit corrects the HTML representation of the URI, but does not
translate the URI itself.

>+ But if you have typed
>	<a 
>BBedit is complaining with the message:
>Value of attribute "href" for element "<a>" is invalid; URL path 
>needs encoding ("/foo?http: 

I tend to disagree, see section 2.2 of RFC 2396:

  If the data for a URI component would conflict with the reserved
  purpose, then the conflicting data must be escaped before forming
  the URI.

Let's take some example URIs:

  [1] http://www.example.org/?foo=bar&baz=&
  [2] http://www.example.org/?foo=bar&baz=%26
  [3] http://www.example.org/?foo=bar;baz=&
  [4] http://www.example.org/?foo=bar;baz=%26

The query consists of key/value pairs.

  [1] foo = <bar> | baz = <>  | <> = <>
  [2] foo = <bar> | baz = <&> |
  [3] foo = <bar> | baz = <&> |
  [4] foo = <bar> | baz = <&> |

In [1] the ampersand seperates pairs, it has three pairs. In [2] there
are only two pairs, the ampersand is now recognized as data, not as
separator, in [3] and [4] the semicolon seperates pairs, it does not
matter whether the ampersand is escaped or not.

Your example


is a syntactically valid URI, since it matches the production rules of
RFC 2396 (and RFC 2616 defining the http: URI scheme). The RFC 2396
point is,

  http://www.example.org/%66%6F%6F is equivalent to

since [fo] is not in the set of unsafe and reserved characters, but


is not equivalent to


since '?' is in one of the mentioned sets. The matter is IMO ambiguity,


clearly indicates [:/] are data and have no special meaning, while


does not, maybe ':' is a separator here or something. The latter needs
additional interpretation not outlined in RFC 2396 in order to claim
equivalence to the former, however, syntactically valid are both URIs.
Received on Friday, 24 May 2002 18:45:12 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:04 UTC