- From: Williams, Stuart <skw@hp.com>
- Date: Fri, 26 Mar 2004 08:54:09 -0000
- To: Chris Lilley <chris@w3.org>
- Cc: Martin Duerst <duerst@w3.org>, www-archive@w3.org
Hello Chris, [trimmed this down to just you and Martin] > I am saying that one should either compare IRIs, or > canonicalize the IRIs to URIs and compare the fully > canonicalized forms (ie, fully hexified and upper case, not > lower, for the hex digits A to F). So... if you do a character-by-character comparision for on two IRI and find them to be different - as a design requirement on the canonicalize IRI to URI mapping - would you expect the canonicalize URI to be different? ie. forall x,y in IRI: not( x==y ) => not( iriToUri(x) == iriToUri(x) ) where == is character-by-character comparison. Martin observed that another property of the current mapping is that forall x in IRI: iriToUri(x) == iriToUri(iriToUri(x)) which makes it impossible to achieve the first property - its easy to find a counter example where x and iriToUri(x) are different character-by-character. I don't know if this second property is a design requirement (URI map onto themselves). If one regards IRI and URI as distinct sets - ie. the identifiers that satisfy the generic URI syntax are URI and *not* IRI. IRI are any other identifiers that satisfy the current IRI syntax. If there were a reserved character in URI and IRI syntax that were only introduced unescaped into an URI by the IRI->URI mapping - then the IRI would map into an otherwised unused part of URI space. If the mapping were only applied to IRI (and not to things that were already URI) then it wouldn't be applied recursively, and... it may also be invertable. [Just thinking aloud] Stuart. -- > -----Original Message----- > From: Chris Lilley [mailto:chris@w3.org] > Sent: 26 March 2004 03:32 > To: Williams, Stuart > Cc: tag@w3.org; Martin Duerst > Subject: Re: [Minutes] 22 March 2004 TAG teleconf > (charmodReview-17, LC-k lyne26, LC-kopecky5, LC-kopecky6, > LC-booth3, LC-schema17) > > On Thursday, March 25, 2004, 1:52:46 PM, Stuart wrote: > > WS> Hello Chris, > > WS> [Apologies for holding a technical discussion here on tag... if its > WS> going to go on we should move it elsewhere - public-iri@w3.org seem > WS> most appropriate.] > > >> Which is why it says to keep the character (in this case ~) as a > >> character. Once you start escaping it then there are escaped and > >> non-escaped forms and upper and lower case forms .... so the IRI > >> spec does the right thing here. > > WS> Hmmm... so on account of the "MUST NOT" above, which I take to be > WS> "the right thing" from the IRI spec, are you saying that there are > WS> IRI that cannot be mapped to URI? > > Not at all. > > I am saying that one should either compare IRIs, or > canonicalize the IRIs to URIs and compare the fully > canonicalized forms (ie, fully hexified and upper case, not > lower, for the hex digits A to F). > > -- > Chris Lilley mailto:chris@w3.org > Chair, W3C SVG Working Group > Member, W3C Technical Architecture Group >
Received on Friday, 26 March 2004 04:33:03 UTC