- From: Williams, Stuart <skw@hplb.hpl.hp.com>
- Date: Tue, 4 Feb 2003 12:19:40 -0000
- To: "'Martin Duerst'" <duerst@w3.org>
- Cc: Michel Suignard <michelsu@microsoft.com>, www-international@w3.org, www-tag@w3.org
Hi Martin,
In the 2nd comparision, if the fully escaped sequences are for comparison
only, I'm not sure why you protected these 14 characters from being %
escaped. Is there a reason why excluding them from the expansion is
neccessary?
> > - If the group is not a %-group, and if the character is
> > one of the following 14 characters, then use that character
> > directly: % # [ ] ; / ? : @ & = + $ ,
> > (This will escape characters such as:
> > SPACE, < > " { } | \ ^ `
> > It currently not clear whether these will be allowed
> > as parts of IRIs, but whether they get escaped or not
> > will not affect the result of the comparison operation
> > if they are not allowed and therefore don't appear in
> > input.)
Also, is it clear that only the characters 0-9, a-f and A-F are permissable
following a % ?
> > - Separate the string into groups. A group consists of
> > either a '%' and the following two characters (a %-group),
> > or of a single character that is not part of a %-group.
http://example.org/paris%louvre -> %lo is a group?
http://example.org/names%abraham -> %ab is (intended to be) a group?
[Admittedly not very clever use of %'s]
Thanks,
Stuart
--
> -----Original Message-----
> From: Martin Duerst [mailto:duerst@w3.org]
> Sent: 03 February 2003 18:15
> To: www-tag@w3.org
> Cc: Michel Suignard; www-international@w3.org
> Subject: Fwd: proposed text on IRIEverywhere-27
>
>
>
> This is the text that I sent to Chris relating the
> action item from the TAG, quite a while ago.
>
> >Date: Mon, 02 Dec 2002 14:14:35 +0900
> >To: Chris Lilley <chris@w3.org>
> >From: Martin Duerst <duerst@w3.org>
> >Subject: proposed text on IRIEverywhere-27
> >Cc: w3t-archive
> >
> >Hello Chris,
> >
> >Here is some text that we may want to use for our action items.
> >
> >The text to put in depends on the operations the spec in question
> >is doing with IRIs. At the moment, we are covering two operations,
> >'equivalence' comparison (for some definition of equivalence) and
> >resolution.
> >
> >In all cases, the proposed texts can be used more or less
> >using cut-and-paste, but please check carefully to adjust
> >details (wording, terms, level of detail, references)
> >where necessary.
> >
> >
> >For equality comparison, assuming %7e != %7E != ~ :
> >
> >In order to check whether two IRIs match according to
> >this kind of equivalence, proceed according to the
> >following steps:
> >
> >- Represent the two IRIs as a string of characters from the
> > UCS (Universal Character Set, [ISO10646]/[Unicode]).
> > For IRIs taken from an XML document, the 'IRI as a string
> > of characters' refers to the sequence of character information
> > items in the infoset (i.e. after parsing). For IRIs taken
> > from other contexts, define/use something similar.
> >- Compare the two strings character by character (without using
> > any additional equivalences, e.g. case equivalences, i.e.
> > comparing codepoint-to-codepoint). If you find any difference,
> > the two IRIs do not match. If you find no differences,
> > the two IRIs match.
> >
> >
> >
> >For equality comparison, assuming %7e == %7E == ~ :
> >
> >In order to check whether two IRIs match according to
> >this kind of equivalence, proceed according to the
> >following steps (or any procedure that produces the
> >same results):
> >
> >- Represent the two IRIs as a string of characters from the
> > UCS (Universal Character Set, [ISO10646]/[Unicode]).
> > For IRIs taken from an XML document, the 'IRI as a string
> > of characters' refers to the sequence of character information
> > items in the infoset (i.e. after parsing). For IRIs taken
> > from other contexts, define/use something similar.
> >- For each of the two strings obtained above, calculate
> > an 'escaped string' as described in the following:
> > - Separate the string into groups. A group consists of
> > either a '%' and the following two characters (a %-group),
> > or of a single character that is not part of a %-group.
> > - For each group, do the following:
> > - If the group is a %-group, convert all letters between
> > 'A' and 'F' to their lowercase equivalents.
> > - If the group is not a %-group, and if the character is
> > one of the following 14 characters, then use that character
> > directly: % # [ ] ; / ? : @ & = + $ ,
> > (This will escape characters such as:
> > SPACE, < > " { } | \ ^ `
> > It currently not clear whether these will be allowed
> > as parts of IRIs, but whether they get escaped or not
> > will not affect the result of the comparison operation
> > if they are not allowed and therefore don't appear in
> > input.)
> > - If the group is not a %-group, and the character is not
> > listed in the previous clause, then encode the character
> > into a sequence of bytes using UTF-8, and then convert
> > each of these bytes into a sequence of a '%' character
> > followed by two hexadecimal characters together expressing
> > the hexadecimal value of the byte. Use the letters 'a' - 'f'
> > (lower case).
> > (Note: different ways of escaping/unescaping may be chosen
> > by an implementation, but if this is done, care has to be
> > taken that all different forms of escaping are mapped to
> > the same output.)
> > - Concatenate the result of converting each group, in the order
> > of the original groups, to obtain the escaped string.
> > (The escaped strings are to be used just for the comparison
> > in the next step below. They may be stored to be reused in
> > subsequent comparisons, but they must not be used for any
> > other purpose, and must not be exposed.)
> >- Compare the two escaped strings obtained in the previous
> > step character by character (without using
> > any additional equivalences, e.g. case equivalences, i.e.
> > comparing codepoint-to-codepoint). If you find any difference,
> > the two IRIs do not match. If you find no differences,
> > the two IRIs match.
> >
> >
> >- Text for resolution: to be done.
> >
> >
> >Regards, Martin.
>
Received on Tuesday, 4 February 2003 07:23:45 UTC