- From: Williams, Stuart <skw@hplb.hpl.hp.com>
- Date: Tue, 4 Feb 2003 12:19:40 -0000
- To: "'Martin Duerst'" <duerst@w3.org>
- Cc: Michel Suignard <michelsu@microsoft.com>, www-international@w3.org, www-tag@w3.org
Hi Martin, In the 2nd comparision, if the fully escaped sequences are for comparison only, I'm not sure why you protected these 14 characters from being % escaped. Is there a reason why excluding them from the expansion is neccessary? > > - If the group is not a %-group, and if the character is > > one of the following 14 characters, then use that character > > directly: % # [ ] ; / ? : @ & = + $ , > > (This will escape characters such as: > > SPACE, < > " { } | \ ^ ` > > It currently not clear whether these will be allowed > > as parts of IRIs, but whether they get escaped or not > > will not affect the result of the comparison operation > > if they are not allowed and therefore don't appear in > > input.) Also, is it clear that only the characters 0-9, a-f and A-F are permissable following a % ? > > - Separate the string into groups. A group consists of > > either a '%' and the following two characters (a %-group), > > or of a single character that is not part of a %-group. http://example.org/paris%louvre -> %lo is a group? http://example.org/names%abraham -> %ab is (intended to be) a group? [Admittedly not very clever use of %'s] Thanks, Stuart -- > -----Original Message----- > From: Martin Duerst [mailto:duerst@w3.org] > Sent: 03 February 2003 18:15 > To: www-tag@w3.org > Cc: Michel Suignard; www-international@w3.org > Subject: Fwd: proposed text on IRIEverywhere-27 > > > > This is the text that I sent to Chris relating the > action item from the TAG, quite a while ago. > > >Date: Mon, 02 Dec 2002 14:14:35 +0900 > >To: Chris Lilley <chris@w3.org> > >From: Martin Duerst <duerst@w3.org> > >Subject: proposed text on IRIEverywhere-27 > >Cc: w3t-archive > > > >Hello Chris, > > > >Here is some text that we may want to use for our action items. > > > >The text to put in depends on the operations the spec in question > >is doing with IRIs. At the moment, we are covering two operations, > >'equivalence' comparison (for some definition of equivalence) and > >resolution. > > > >In all cases, the proposed texts can be used more or less > >using cut-and-paste, but please check carefully to adjust > >details (wording, terms, level of detail, references) > >where necessary. > > > > > >For equality comparison, assuming %7e != %7E != ~ : > > > >In order to check whether two IRIs match according to > >this kind of equivalence, proceed according to the > >following steps: > > > >- Represent the two IRIs as a string of characters from the > > UCS (Universal Character Set, [ISO10646]/[Unicode]). > > For IRIs taken from an XML document, the 'IRI as a string > > of characters' refers to the sequence of character information > > items in the infoset (i.e. after parsing). For IRIs taken > > from other contexts, define/use something similar. > >- Compare the two strings character by character (without using > > any additional equivalences, e.g. case equivalences, i.e. > > comparing codepoint-to-codepoint). If you find any difference, > > the two IRIs do not match. If you find no differences, > > the two IRIs match. > > > > > > > >For equality comparison, assuming %7e == %7E == ~ : > > > >In order to check whether two IRIs match according to > >this kind of equivalence, proceed according to the > >following steps (or any procedure that produces the > >same results): > > > >- Represent the two IRIs as a string of characters from the > > UCS (Universal Character Set, [ISO10646]/[Unicode]). > > For IRIs taken from an XML document, the 'IRI as a string > > of characters' refers to the sequence of character information > > items in the infoset (i.e. after parsing). For IRIs taken > > from other contexts, define/use something similar. > >- For each of the two strings obtained above, calculate > > an 'escaped string' as described in the following: > > - Separate the string into groups. A group consists of > > either a '%' and the following two characters (a %-group), > > or of a single character that is not part of a %-group. > > - For each group, do the following: > > - If the group is a %-group, convert all letters between > > 'A' and 'F' to their lowercase equivalents. > > - If the group is not a %-group, and if the character is > > one of the following 14 characters, then use that character > > directly: % # [ ] ; / ? : @ & = + $ , > > (This will escape characters such as: > > SPACE, < > " { } | \ ^ ` > > It currently not clear whether these will be allowed > > as parts of IRIs, but whether they get escaped or not > > will not affect the result of the comparison operation > > if they are not allowed and therefore don't appear in > > input.) > > - If the group is not a %-group, and the character is not > > listed in the previous clause, then encode the character > > into a sequence of bytes using UTF-8, and then convert > > each of these bytes into a sequence of a '%' character > > followed by two hexadecimal characters together expressing > > the hexadecimal value of the byte. Use the letters 'a' - 'f' > > (lower case). > > (Note: different ways of escaping/unescaping may be chosen > > by an implementation, but if this is done, care has to be > > taken that all different forms of escaping are mapped to > > the same output.) > > - Concatenate the result of converting each group, in the order > > of the original groups, to obtain the escaped string. > > (The escaped strings are to be used just for the comparison > > in the next step below. They may be stored to be reused in > > subsequent comparisons, but they must not be used for any > > other purpose, and must not be exposed.) > >- Compare the two escaped strings obtained in the previous > > step character by character (without using > > any additional equivalences, e.g. case equivalences, i.e. > > comparing codepoint-to-codepoint). If you find any difference, > > the two IRIs do not match. If you find no differences, > > the two IRIs match. > > > > > >- Text for resolution: to be done. > > > > > >Regards, Martin. >
Received on Tuesday, 4 February 2003 07:23:45 UTC