- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 03 Feb 2003 13:14:42 -0500
- To: www-tag@w3.org
- Cc: Michel Suignard <michelsu@microsoft.com>, www-international@w3.org
This is the text that I sent to Chris relating the action item from the TAG, quite a while ago. >Date: Mon, 02 Dec 2002 14:14:35 +0900 >To: Chris Lilley <chris@w3.org> >From: Martin Duerst <duerst@w3.org> >Subject: proposed text on IRIEverywhere-27 >Cc: w3t-archive > >Hello Chris, > >Here is some text that we may want to use for our action items. > >The text to put in depends on the operations the spec in question >is doing with IRIs. At the moment, we are covering two operations, >'equivalence' comparison (for some definition of equivalence) and >resolution. > >In all cases, the proposed texts can be used more or less >using cut-and-paste, but please check carefully to adjust >details (wording, terms, level of detail, references) >where necessary. > > >For equality comparison, assuming %7e != %7E != ~ : > >In order to check whether two IRIs match according to >this kind of equivalence, proceed according to the >following steps: > >- Represent the two IRIs as a string of characters from the > UCS (Universal Character Set, [ISO10646]/[Unicode]). > For IRIs taken from an XML document, the 'IRI as a string > of characters' refers to the sequence of character information > items in the infoset (i.e. after parsing). For IRIs taken > from other contexts, define/use something similar. >- Compare the two strings character by character (without using > any additional equivalences, e.g. case equivalences, i.e. > comparing codepoint-to-codepoint). If you find any difference, > the two IRIs do not match. If you find no differences, > the two IRIs match. > > > >For equality comparison, assuming %7e == %7E == ~ : > >In order to check whether two IRIs match according to >this kind of equivalence, proceed according to the >following steps (or any procedure that produces the >same results): > >- Represent the two IRIs as a string of characters from the > UCS (Universal Character Set, [ISO10646]/[Unicode]). > For IRIs taken from an XML document, the 'IRI as a string > of characters' refers to the sequence of character information > items in the infoset (i.e. after parsing). For IRIs taken > from other contexts, define/use something similar. >- For each of the two strings obtained above, calculate > an 'escaped string' as described in the following: > - Separate the string into groups. A group consists of > either a '%' and the following two characters (a %-group), > or of a single character that is not part of a %-group. > - For each group, do the following: > - If the group is a %-group, convert all letters between > 'A' and 'F' to their lowercase equivalents. > - If the group is not a %-group, and if the character is > one of the following 14 characters, then use that character > directly: % # [ ] ; / ? : @ & = + $ , > (This will escape characters such as: > SPACE, < > " { } | \ ^ ` > It currently not clear whether these will be allowed > as parts of IRIs, but whether they get escaped or not > will not affect the result of the comparison operation > if they are not allowed and therefore don't appear in > input.) > - If the group is not a %-group, and the character is not > listed in the previous clause, then encode the character > into a sequence of bytes using UTF-8, and then convert > each of these bytes into a sequence of a '%' character > followed by two hexadecimal characters together expressing > the hexadecimal value of the byte. Use the letters 'a' - 'f' > (lower case). > (Note: different ways of escaping/unescaping may be chosen > by an implementation, but if this is done, care has to be > taken that all different forms of escaping are mapped to > the same output.) > - Concatenate the result of converting each group, in the order > of the original groups, to obtain the escaped string. > (The escaped strings are to be used just for the comparison > in the next step below. They may be stored to be reused in > subsequent comparisons, but they must not be used for any > other purpose, and must not be exposed.) >- Compare the two escaped strings obtained in the previous > step character by character (without using > any additional equivalences, e.g. case equivalences, i.e. > comparing codepoint-to-codepoint). If you find any difference, > the two IRIs do not match. If you find no differences, > the two IRIs match. > > >- Text for resolution: to be done. > > >Regards, Martin.
Received on Monday, 3 February 2003 14:13:45 UTC