- From: McDonald, Ira <imcdonald@sharplabs.com>
- Date: Mon, 25 Oct 2004 09:22:41 -0700
- To: "'Tim Kindberg'" <timothy@hpl.hp.com>, "McDonald, Ira" <imcdonald@sharplabs.com>
- Cc: "Hammond, Tony" <T.Hammond@nature.com>, uri@w3.org, sandro hawke <sandro@w3.org>
Hi Tim, Yes - problems of transcription by users as well as problems of two TAG URIs that _appear_ EXACTLY identical, but in fact have (for instance) their diacritical marks not in canonical order (i.e., according to Unicode std). A better starting point than Nameprep (RFC 3454) would be: "String Profile for Internet Small Computer Systems Interface (iSCSI) Names", RFC 3722, April 2004. ...which is not limited to I18N of host names. I'd suggest stealing ideas and text from RFC 3722. I think that TAG URIs are inherently fragile when deployed without some such normalization (and disallowed characters). Cheers, - Ira Ira McDonald (Musician / Software Architect) Blue Roof Music / High North Inc PO Box 221 Grand Marais, MI 49839 phone: +1-906-494-2434 email: imcdonald@sharplabs.com -----Original Message----- From: uri-request@w3.org [mailto:uri-request@w3.org]On Behalf Of Tim Kindberg Sent: Monday, October 25, 2004 9:58 AM To: McDonald, Ira Cc: Hammond, Tony; uri@w3.org; sandro hawke Subject: Re: TAG scheme - some comments Hi Ira, Thanks for your message. If I understand you correctly, you're talking about normalisation w.r.t. some of the subtle differences (e.g. nearly-identical visual appearance, ordering of diacritical marks) that sometimes occur between characters in the Universal Character Set. Maybe that's exactly what Tony meant too -- in which case I'm sorry for dispatching his point over-hastily. I'm so unused to thinking about internationalisation issues that I hadn't thought about the problem of *people* producing subtle differences when they transcribe tags. (I don't see a problem otherwise, since tags, once minted, are not meant to be "deconstructed" by machines.) But what I don't want to have to do is define our own "tagprep" derivation of stringprep -- and get embroiled in another error-prone excursion. I'd like to borrow something good enough from elsewhere -- like nameprep, which I would have thought would be suitable except that it says it's specifically for IDNs. Does anyone out there have any advice? Cheers, Tim. McDonald, Ira wrote: > Hi, > > >>><Tony Hammond wrote...> >>>6. Note that normalization issues are ducked. :) Probably wisely too. Not >>>sure what the ramifications of this might be especially wrt TAG > > processors > >>>and %-encoding. > > >><Tim Kindberg replied...> >>Yes, we decided that tags that are different as strings (with same >>character encoding) are different, full stop. It's nice and easy to >>understand and there's no compelling need for a more sophisticated >>criterion for equality. > > > While neither RFC 2717 nor draft RFC 2717bis address it, > most existing URI scheme RFCs actually do identify rules for > "comparison of two XXX URIs". Since TAG values can be UTF-8 > (percent-encoded), there are certainly string comparison > issues to be addressed (like underlying UTF-8 normalization > to NFC or NFKC forms). Using a Stringprep profile (RFC 3454) > is a good approach (RFC 3454). I suggest looking at: > > "Nameprep: A Stringprep Profile for Internationalized Domain Names" > RFC 3491, March 2003 > > > Cheers, > - Ira > > Ira McDonald (Musician / Software Architect) > Blue Roof Music / High North Inc > PO Box 221 Grand Marais, MI 49839 > phone: +1-906-494-2434 > email: imcdonald@sharplabs.com -- Tim Kindberg hewlett-packard laboratories filton road stoke gifford bristol bs34 8qz uk purl.org/net/TimKindberg timothy@hpl.hp.com voice +44 (0)117 312 9920 fax +44 (0)117 312 8003
Received on Monday, 25 October 2004 16:30:56 UTC