Re: Good/Bad - URI encoding in HTML editor from Bjoern Hoehrmann on 2002-05-24 (uri@w3.org from May 2002)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 25 May 2002 00:44:27 +0200
To: Karl Dubost <karl@w3.org>
Cc: uri@w3.org
Message-ID: <uueteuogcp859t4bc2l4jq8pqk5dv4n6so@4ax.com>

* Karl Dubost wrote:
>The question is that BBedit has a mechanism to automatically 
>translate the URIs in a document when it's inside an href.
>
>+ For example when you have typed
>	<a href="http://www.example.org/foo?toto=3&tata=4">A request</a>
>
>BBedit will convert it to
>	<a href="http://www.example.org/foo?toto=3[&amp;]tata=4">A request</a>

BBedit corrects the HTML representation of the URI, but does not
translate the URI itself.

>+ But if you have typed
>	<a 
>href="http://www.example.org/foo?http://www.example.net/path/index.html">A 
>request</a>
>
>BBedit is complaining with the message:
>Value of attribute "href" for element "<a>" is invalid; URL path 
>needs encoding ("/foo?http: 
>%2F%2Fwww.example.net%2Fpath%2Findex.html").

I tend to disagree, see section 2.2 of RFC 2396:

[...]
  If the data for a URI component would conflict with the reserved
  purpose, then the conflicting data must be escaped before forming
  the URI.
[...] 

Let's take some example URIs:

  [1] http://www.example.org/?foo=bar&baz=&
  [2] http://www.example.org/?foo=bar&baz=%26
  [3] http://www.example.org/?foo=bar;baz=&
  [4] http://www.example.org/?foo=bar;baz=%26

The query consists of key/value pairs.

  [1] foo = <bar> | baz = <>  | <> = <>
  [2] foo = <bar> | baz = <&> |
  [3] foo = <bar> | baz = <&> |
  [4] foo = <bar> | baz = <&> |

In [1] the ampersand seperates pairs, it has three pairs. In [2] there
are only two pairs, the ampersand is now recognized as data, not as
separator, in [3] and [4] the semicolon seperates pairs, it does not
matter whether the ampersand is escaped or not.

Your example

  http://www.example.org/foo?http://www.example.net/path/index.html

is a syntactically valid URI, since it matches the production rules of
RFC 2396 (and RFC 2616 defining the http: URI scheme). The RFC 2396
point is,

  http://www.example.org/%66%6F%6F is equivalent to
  http://www.example.org/foo

since [fo] is not in the set of unsafe and reserved characters, but

  http://www.example.org/%3Ffoo

is not equivalent to

  http://www.example.org/?foo

since '?' is in one of the mentioned sets. The matter is IMO ambiguity,
using

  http://www.example.org/foo?http%3A%2F%2Fwww.example.net%2F

clearly indicates [:/] are data and have no special meaning, while

  http://www.example.org/foo?http://www.example.net/

does not, maybe ':' is a separator here or something. The latter needs
additional interpretation not outlined in RFC 2396 in order to claim
equivalence to the former, however, syntactically valid are both URIs.

Received on Friday, 24 May 2002 18:45:12 UTC