RE: proposed text on IRIEverywhere-27 from Williams, Stuart on 2003-02-04 (www-international@w3.org from January to March 2003)

From: Williams, Stuart <skw@hplb.hpl.hp.com>
Date: Tue, 4 Feb 2003 12:19:40 -0000
To: "'Martin Duerst'" <duerst@w3.org>
Cc: Michel Suignard <michelsu@microsoft.com>, www-international@w3.org, www-tag@w3.org
Message-ID: <5E13A1874524D411A876006008CD059F04A072CF@0-mail-1.hpl.hp.com>
Hi Martin,

In the 2nd comparision, if the fully escaped sequences are for comparison
only, I'm not sure why you protected these 14 characters from being %
escaped. Is there a reason why excluding them from the expansion is
neccessary?

> >     - If the group is not a %-group, and if the character is
> >       one of the following 14 characters, then use that character
> >       directly:      % # [ ] ; / ? : @ & = + $ ,
> >       (This will escape characters such as:
> >          SPACE, < > " { } | \ ^ `
> >        It currently not clear whether these will be allowed
> >        as parts of IRIs, but whether they get escaped or not
> >        will not affect the result of the comparison operation
> >        if they are not allowed and therefore don't appear in
> >        input.)

Also, is it clear that only the characters 0-9, a-f and A-F are permissable
following a % ?

> >   - Separate the string into groups. A group consists of
> >     either a '%' and the following two characters (a %-group),
> >     or of a single character that is not part of a %-group.

http://example.org/paris%louvre -> %lo is a group?
http://example.org/names%abraham -> %ab is (intended to be) a group?

[Admittedly not very clever use of %'s]

Thanks,

Stuart
--

> -----Original Message-----
> From: Martin Duerst [mailto:duerst@w3.org]
> Sent: 03 February 2003 18:15
> To: www-tag@w3.org
> Cc: Michel Suignard; www-international@w3.org
> Subject: Fwd: proposed text on IRIEverywhere-27
> 
> 
> 
> This is the text that I sent to Chris relating the
> action item from the TAG, quite a while ago.
> 
> >Date: Mon, 02 Dec 2002 14:14:35 +0900
> >To: Chris Lilley <chris@w3.org>
> >From: Martin Duerst <duerst@w3.org>
> >Subject: proposed text on IRIEverywhere-27
> >Cc: w3t-archive
> >
> >Hello Chris,
> >
> >Here is some text that we may want to use for our action items.
> >
> >The text to put in depends on the operations the spec in question
> >is doing with IRIs. At the moment, we are covering two operations,
> >'equivalence' comparison (for some definition of equivalence) and
> >resolution.
> >
> >In all cases, the proposed texts can be used more or less
> >using cut-and-paste, but please check carefully to adjust
> >details (wording, terms, level of detail, references)
> >where necessary.
> >
> >
> >For equality comparison, assuming %7e != %7E != ~ :
> >
> >In order to check whether two IRIs match according to
> >this kind of equivalence, proceed according to the
> >following steps:
> >
> >- Represent the two IRIs as a string of characters from the
> >   UCS (Universal Character Set, [ISO10646]/[Unicode]).
> >   For IRIs taken from an XML document, the 'IRI as a string
> >   of characters' refers to the sequence of character information
> >   items in the infoset (i.e. after parsing). For IRIs taken
> >   from other contexts, define/use something similar.
> >- Compare the two strings character by character (without using
> >   any additional equivalences, e.g. case equivalences, i.e.
> >   comparing codepoint-to-codepoint). If you find any difference,
> >   the two IRIs do not match. If you find no differences,
> >   the two IRIs match.
> >
> >
> >
> >For equality comparison, assuming %7e == %7E == ~ :
> >
> >In order to check whether two IRIs match according to
> >this kind of equivalence, proceed according to the
> >following steps (or any procedure that produces the
> >same results):
> >
> >- Represent the two IRIs as a string of characters from the
> >   UCS (Universal Character Set, [ISO10646]/[Unicode]).
> >   For IRIs taken from an XML document, the 'IRI as a string
> >   of characters' refers to the sequence of character information
> >   items in the infoset (i.e. after parsing). For IRIs taken
> >   from other contexts, define/use something similar.
> >- For each of the two strings obtained above, calculate
> >   an 'escaped string' as described in the following:
> >   - Separate the string into groups. A group consists of
> >     either a '%' and the following two characters (a %-group),
> >     or of a single character that is not part of a %-group.
> >   - For each group, do the following:
> >     - If the group is a %-group, convert all letters between
> >       'A' and 'F' to their lowercase equivalents.
> >     - If the group is not a %-group, and if the character is
> >       one of the following 14 characters, then use that character
> >       directly:      % # [ ] ; / ? : @ & = + $ ,
> >       (This will escape characters such as:
> >          SPACE, < > " { } | \ ^ `
> >        It currently not clear whether these will be allowed
> >        as parts of IRIs, but whether they get escaped or not
> >        will not affect the result of the comparison operation
> >        if they are not allowed and therefore don't appear in
> >        input.)
> >     - If the group is not a %-group, and the character is not
> >       listed in the previous clause, then encode the character
> >       into a sequence of bytes using UTF-8, and then convert
> >       each of these bytes into a sequence of a '%' character
> >       followed by two hexadecimal characters together expressing
> >       the hexadecimal value of the byte. Use the letters 'a' - 'f'
> >       (lower case).
> >     (Note: different ways of escaping/unescaping may be chosen
> >      by an implementation, but if this is done, care has to be
> >      taken that all different forms of escaping are mapped to
> >      the same output.)
> >   - Concatenate the result of converting each group, in the order
> >     of the original groups, to obtain the escaped string.
> >   (The escaped strings are to be used just for the comparison
> >   in the next step below. They may be stored to be reused in
> >   subsequent comparisons, but they must not be used for any
> >   other purpose, and must not be exposed.)
> >- Compare the two escaped strings obtained in the previous
> >   step character by character (without using
> >   any additional equivalences, e.g. case equivalences, i.e.
> >   comparing codepoint-to-codepoint). If you find any difference,
> >   the two IRIs do not match. If you find no differences,
> >   the two IRIs match.
> >
> >
> >- Text for resolution: to be done.
> >
> >
> >Regards,    Martin.
>
Received on Tuesday, 4 February 2003 07:23:45 UTC