- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 03 Feb 2003 13:14:42 -0500
- To: www-tag@w3.org
- Cc: Michel Suignard <michelsu@microsoft.com>, www-international@w3.org
This is the text that I sent to Chris relating the
action item from the TAG, quite a while ago.
>Date: Mon, 02 Dec 2002 14:14:35 +0900
>To: Chris Lilley <chris@w3.org>
>From: Martin Duerst <duerst@w3.org>
>Subject: proposed text on IRIEverywhere-27
>Cc: w3t-archive
>
>Hello Chris,
>
>Here is some text that we may want to use for our action items.
>
>The text to put in depends on the operations the spec in question
>is doing with IRIs. At the moment, we are covering two operations,
>'equivalence' comparison (for some definition of equivalence) and
>resolution.
>
>In all cases, the proposed texts can be used more or less
>using cut-and-paste, but please check carefully to adjust
>details (wording, terms, level of detail, references)
>where necessary.
>
>
>For equality comparison, assuming %7e != %7E != ~ :
>
>In order to check whether two IRIs match according to
>this kind of equivalence, proceed according to the
>following steps:
>
>- Represent the two IRIs as a string of characters from the
> UCS (Universal Character Set, [ISO10646]/[Unicode]).
> For IRIs taken from an XML document, the 'IRI as a string
> of characters' refers to the sequence of character information
> items in the infoset (i.e. after parsing). For IRIs taken
> from other contexts, define/use something similar.
>- Compare the two strings character by character (without using
> any additional equivalences, e.g. case equivalences, i.e.
> comparing codepoint-to-codepoint). If you find any difference,
> the two IRIs do not match. If you find no differences,
> the two IRIs match.
>
>
>
>For equality comparison, assuming %7e == %7E == ~ :
>
>In order to check whether two IRIs match according to
>this kind of equivalence, proceed according to the
>following steps (or any procedure that produces the
>same results):
>
>- Represent the two IRIs as a string of characters from the
> UCS (Universal Character Set, [ISO10646]/[Unicode]).
> For IRIs taken from an XML document, the 'IRI as a string
> of characters' refers to the sequence of character information
> items in the infoset (i.e. after parsing). For IRIs taken
> from other contexts, define/use something similar.
>- For each of the two strings obtained above, calculate
> an 'escaped string' as described in the following:
> - Separate the string into groups. A group consists of
> either a '%' and the following two characters (a %-group),
> or of a single character that is not part of a %-group.
> - For each group, do the following:
> - If the group is a %-group, convert all letters between
> 'A' and 'F' to their lowercase equivalents.
> - If the group is not a %-group, and if the character is
> one of the following 14 characters, then use that character
> directly: % # [ ] ; / ? : @ & = + $ ,
> (This will escape characters such as:
> SPACE, < > " { } | \ ^ `
> It currently not clear whether these will be allowed
> as parts of IRIs, but whether they get escaped or not
> will not affect the result of the comparison operation
> if they are not allowed and therefore don't appear in
> input.)
> - If the group is not a %-group, and the character is not
> listed in the previous clause, then encode the character
> into a sequence of bytes using UTF-8, and then convert
> each of these bytes into a sequence of a '%' character
> followed by two hexadecimal characters together expressing
> the hexadecimal value of the byte. Use the letters 'a' - 'f'
> (lower case).
> (Note: different ways of escaping/unescaping may be chosen
> by an implementation, but if this is done, care has to be
> taken that all different forms of escaping are mapped to
> the same output.)
> - Concatenate the result of converting each group, in the order
> of the original groups, to obtain the escaped string.
> (The escaped strings are to be used just for the comparison
> in the next step below. They may be stored to be reused in
> subsequent comparisons, but they must not be used for any
> other purpose, and must not be exposed.)
>- Compare the two escaped strings obtained in the previous
> step character by character (without using
> any additional equivalences, e.g. case equivalences, i.e.
> comparing codepoint-to-codepoint). If you find any difference,
> the two IRIs do not match. If you find no differences,
> the two IRIs match.
>
>
>- Text for resolution: to be done.
>
>
>Regards, Martin.
Received on Monday, 3 February 2003 14:13:45 UTC