- From: Tim Kindberg <timothy@hpl.hp.com>
- Date: Mon, 25 Oct 2004 14:57:34 +0100
- To: "McDonald, Ira" <imcdonald@sharplabs.com>
- Cc: "Hammond, Tony" <T.Hammond@nature.com>, uri@w3.org, sandro hawke <sandro@w3.org>
Hi Ira, Thanks for your message. If I understand you correctly, you're talking about normalisation w.r.t. some of the subtle differences (e.g. nearly-identical visual appearance, ordering of diacritical marks) that sometimes occur between characters in the Universal Character Set. Maybe that's exactly what Tony meant too -- in which case I'm sorry for dispatching his point over-hastily. I'm so unused to thinking about internationalisation issues that I hadn't thought about the problem of *people* producing subtle differences when they transcribe tags. (I don't see a problem otherwise, since tags, once minted, are not meant to be "deconstructed" by machines.) But what I don't want to have to do is define our own "tagprep" derivation of stringprep -- and get embroiled in another error-prone excursion. I'd like to borrow something good enough from elsewhere -- like nameprep, which I would have thought would be suitable except that it says it's specifically for IDNs. Does anyone out there have any advice? Cheers, Tim. McDonald, Ira wrote: > Hi, > > >>><Tony Hammond wrote...> >>>6. Note that normalization issues are ducked. :) Probably wisely too. Not >>>sure what the ramifications of this might be especially wrt TAG > > processors > >>>and %-encoding. > > >><Tim Kindberg replied...> >>Yes, we decided that tags that are different as strings (with same >>character encoding) are different, full stop. It's nice and easy to >>understand and there's no compelling need for a more sophisticated >>criterion for equality. > > > While neither RFC 2717 nor draft RFC 2717bis address it, > most existing URI scheme RFCs actually do identify rules for > "comparison of two XXX URIs". Since TAG values can be UTF-8 > (percent-encoded), there are certainly string comparison > issues to be addressed (like underlying UTF-8 normalization > to NFC or NFKC forms). Using a Stringprep profile (RFC 3454) > is a good approach (RFC 3454). I suggest looking at: > > "Nameprep: A Stringprep Profile for Internationalized Domain Names" > RFC 3491, March 2003 > > > Cheers, > - Ira > > Ira McDonald (Musician / Software Architect) > Blue Roof Music / High North Inc > PO Box 221 Grand Marais, MI 49839 > phone: +1-906-494-2434 > email: imcdonald@sharplabs.com -- Tim Kindberg hewlett-packard laboratories filton road stoke gifford bristol bs34 8qz uk purl.org/net/TimKindberg timothy@hpl.hp.com voice +44 (0)117 312 9920 fax +44 (0)117 312 8003
Received on Monday, 25 October 2004 13:57:53 UTC