- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 04 Feb 2003 18:20:08 -0500
- To: "Williams, Stuart" <skw@hplb.hpl.hp.com>
- Cc: Michel Suignard <michelsu@microsoft.com>, www-international@w3.org, www-tag@w3.org
Hello Stuart, At 12:19 03/02/04 +0000, Williams, Stuart wrote: >Hi Martin, > >In the 2nd comparision, if the fully escaped sequences are for comparison >only, I'm not sure why you protected these 14 characters from being % >escaped. Is there a reason why excluding them from the expansion is >neccessary? Yes, there is a very clear reason. These characters are reserved. RFC 2396, in "2.2. Reserved Characters", lists the following as reserved: reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," To this, we have to add [ and ] for ipv6 literals, and # and % which are in effect reserved, but treated differently in the syntax. Escaping them would leave to strange results, http://www.example.org/path/file is definitely NOT the same as http://www.example.org/path%2Ffile . Please note that under certain interpretations of RFC 2397/in certain cases, this may lead to 'false negatives'. For example, my understanding is that http://www.example.org?text=/ and http://www.example.org?text=%2F are supposed to behave equivalent, because '/' is not reserved in a query part. But we can't match all such cases in a general algorithm that has to be scheme-independent. > > > - If the group is not a %-group, and if the character is > > > one of the following 14 characters, then use that character > > > directly: % # [ ] ; / ? : @ & = + $ , > > > (This will escape characters such as: > > > SPACE, < > " { } | \ ^ ` > > > It currently not clear whether these will be allowed > > > as parts of IRIs, but whether they get escaped or not > > > will not affect the result of the comparison operation > > > if they are not allowed and therefore don't appear in > > > input.) > >Also, is it clear that only the characters 0-9, a-f and A-F are permissable >following a % ? Yes. In RFC 2396, in "A. Collected BNF for URI", the only place where '%' appears is in: escaped = "%" hex hex where hex is defined as: hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" > > > - Separate the string into groups. A group consists of > > > either a '%' and the following two characters (a %-group), > > > or of a single character that is not part of a %-group. > >http://example.org/paris%louvre -> %lo is a group? In this algorithm, yes. http://example.org/paris%louvre isn't a legal URI/IRI. So we get 'garbage in'/'garbage out'. >http://example.org/names%abraham -> %ab is (intended to be) a group? Yes, of course. '%ab' is a perfectly legal escape sequence. Human readers may find the word 'abraham' in the URI, but the URI contains only 'octet <ab>' followed by 'raham'. Regards, Martin.
Received on Tuesday, 4 February 2003 18:39:50 UTC