Fwd: proposed text on IRIEverywhere-27 from Martin Duerst on 2003-02-03 (www-tag@w3.org from February 2003)

From: Martin Duerst <duerst@w3.org>
Date: Mon, 03 Feb 2003 13:14:42 -0500
To: www-tag@w3.org
Cc: Michel Suignard <michelsu@microsoft.com>, www-international@w3.org
Message-Id: <4.2.0.58.J.20030129173817.07111198@localhost>
This is the text that I sent to Chris relating the
action item from the TAG, quite a while ago.

>Date: Mon, 02 Dec 2002 14:14:35 +0900
>To: Chris Lilley <chris@w3.org>
>From: Martin Duerst <duerst@w3.org>
>Subject: proposed text on IRIEverywhere-27
>Cc: w3t-archive
>
>Hello Chris,
>
>Here is some text that we may want to use for our action items.
>
>The text to put in depends on the operations the spec in question
>is doing with IRIs. At the moment, we are covering two operations,
>'equivalence' comparison (for some definition of equivalence) and
>resolution.
>
>In all cases, the proposed texts can be used more or less
>using cut-and-paste, but please check carefully to adjust
>details (wording, terms, level of detail, references)
>where necessary.
>
>
>For equality comparison, assuming %7e != %7E != ~ :
>
>In order to check whether two IRIs match according to
>this kind of equivalence, proceed according to the
>following steps:
>
>- Represent the two IRIs as a string of characters from the
>   UCS (Universal Character Set, [ISO10646]/[Unicode]).
>   For IRIs taken from an XML document, the 'IRI as a string
>   of characters' refers to the sequence of character information
>   items in the infoset (i.e. after parsing). For IRIs taken
>   from other contexts, define/use something similar.
>- Compare the two strings character by character (without using
>   any additional equivalences, e.g. case equivalences, i.e.
>   comparing codepoint-to-codepoint). If you find any difference,
>   the two IRIs do not match. If you find no differences,
>   the two IRIs match.
>
>
>
>For equality comparison, assuming %7e == %7E == ~ :
>
>In order to check whether two IRIs match according to
>this kind of equivalence, proceed according to the
>following steps (or any procedure that produces the
>same results):
>
>- Represent the two IRIs as a string of characters from the
>   UCS (Universal Character Set, [ISO10646]/[Unicode]).
>   For IRIs taken from an XML document, the 'IRI as a string
>   of characters' refers to the sequence of character information
>   items in the infoset (i.e. after parsing). For IRIs taken
>   from other contexts, define/use something similar.
>- For each of the two strings obtained above, calculate
>   an 'escaped string' as described in the following:
>   - Separate the string into groups. A group consists of
>     either a '%' and the following two characters (a %-group),
>     or of a single character that is not part of a %-group.
>   - For each group, do the following:
>     - If the group is a %-group, convert all letters between
>       'A' and 'F' to their lowercase equivalents.
>     - If the group is not a %-group, and if the character is
>       one of the following 14 characters, then use that character
>       directly:      % # [ ] ; / ? : @ & = + $ ,
>       (This will escape characters such as:
>          SPACE, < > " { } | \ ^ `
>        It currently not clear whether these will be allowed
>        as parts of IRIs, but whether they get escaped or not
>        will not affect the result of the comparison operation
>        if they are not allowed and therefore don't appear in
>        input.)
>     - If the group is not a %-group, and the character is not
>       listed in the previous clause, then encode the character
>       into a sequence of bytes using UTF-8, and then convert
>       each of these bytes into a sequence of a '%' character
>       followed by two hexadecimal characters together expressing
>       the hexadecimal value of the byte. Use the letters 'a' - 'f'
>       (lower case).
>     (Note: different ways of escaping/unescaping may be chosen
>      by an implementation, but if this is done, care has to be
>      taken that all different forms of escaping are mapped to
>      the same output.)
>   - Concatenate the result of converting each group, in the order
>     of the original groups, to obtain the escaped string.
>   (The escaped strings are to be used just for the comparison
>   in the next step below. They may be stored to be reused in
>   subsequent comparisons, but they must not be used for any
>   other purpose, and must not be exposed.)
>- Compare the two escaped strings obtained in the previous
>   step character by character (without using
>   any additional equivalences, e.g. case equivalences, i.e.
>   comparing codepoint-to-codepoint). If you find any difference,
>   the two IRIs do not match. If you find no differences,
>   the two IRIs match.
>
>
>- Text for resolution: to be done.
>
>
>Regards,    Martin.
Received on Monday, 3 February 2003 14:13:45 UTC