- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 04 Feb 2003 18:20:08 -0500
- To: "Williams, Stuart" <skw@hplb.hpl.hp.com>
- Cc: Michel Suignard <michelsu@microsoft.com>, www-international@w3.org, www-tag@w3.org
Hello Stuart,
At 12:19 03/02/04 +0000, Williams, Stuart wrote:
>Hi Martin,
>
>In the 2nd comparision, if the fully escaped sequences are for comparison
>only, I'm not sure why you protected these 14 characters from being %
>escaped. Is there a reason why excluding them from the expansion is
>neccessary?
Yes, there is a very clear reason. These characters are reserved.
RFC 2396, in "2.2. Reserved Characters", lists the following as
reserved:
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
To this, we have to add [ and ] for ipv6 literals, and # and
% which are in effect reserved, but treated differently in the
syntax. Escaping them would leave to strange results,
http://www.example.org/path/file
is definitely NOT the same as
http://www.example.org/path%2Ffile .
Please note that under certain interpretations of RFC 2397/in
certain cases, this may lead to 'false negatives'. For example,
my understanding is that
http://www.example.org?text=/
and
http://www.example.org?text=%2F
are supposed to behave equivalent, because '/' is not reserved
in a query part. But we can't match all such cases in a general
algorithm that has to be scheme-independent.
> > > - If the group is not a %-group, and if the character is
> > > one of the following 14 characters, then use that character
> > > directly: % # [ ] ; / ? : @ & = + $ ,
> > > (This will escape characters such as:
> > > SPACE, < > " { } | \ ^ `
> > > It currently not clear whether these will be allowed
> > > as parts of IRIs, but whether they get escaped or not
> > > will not affect the result of the comparison operation
> > > if they are not allowed and therefore don't appear in
> > > input.)
>
>Also, is it clear that only the characters 0-9, a-f and A-F are permissable
>following a % ?
Yes. In RFC 2396, in "A. Collected BNF for URI", the only place
where '%' appears is in:
escaped = "%" hex hex
where hex is defined as:
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
> > > - Separate the string into groups. A group consists of
> > > either a '%' and the following two characters (a %-group),
> > > or of a single character that is not part of a %-group.
>
>http://example.org/paris%louvre -> %lo is a group?
In this algorithm, yes. http://example.org/paris%louvre isn't
a legal URI/IRI. So we get 'garbage in'/'garbage out'.
>http://example.org/names%abraham -> %ab is (intended to be) a group?
Yes, of course. '%ab' is a perfectly legal escape sequence.
Human readers may find the word 'abraham' in the URI, but
the URI contains only 'octet <ab>' followed by 'raham'.
Regards, Martin.
Received on Tuesday, 4 February 2003 18:39:49 UTC