W3C home > Mailing lists > Public > uri@w3.org > October 2004

Re: TAG scheme - some comments

From: Tim Kindberg <timothy@hpl.hp.com>
Date: Mon, 25 Oct 2004 14:57:34 +0100
Message-ID: <417D064E.1040403@hpl.hp.com>
To: "McDonald, Ira" <imcdonald@sharplabs.com>
Cc: "Hammond, Tony" <T.Hammond@nature.com>, uri@w3.org, sandro hawke <sandro@w3.org>

Hi Ira,

Thanks for your message. If I understand you correctly, you're talking 
about normalisation w.r.t. some of the subtle differences (e.g. 
nearly-identical visual appearance, ordering of diacritical marks) that 
sometimes occur between characters in the Universal Character Set. Maybe 
that's exactly what Tony meant too -- in which case I'm sorry for 
dispatching his point over-hastily.

I'm so unused to thinking about internationalisation issues that I 
hadn't thought about the problem of *people* producing subtle 
differences when they transcribe tags. (I don't see a problem otherwise, 
since tags, once minted, are not meant to be "deconstructed" by machines.)

But what I don't want to have to do is define our own "tagprep" 
derivation of stringprep -- and get embroiled in another error-prone 
excursion. I'd like to borrow something good enough from elsewhere -- 
like nameprep, which I would have thought would be suitable except that 
it says it's specifically for IDNs.

Does anyone out there have any advice?

Cheers,

Tim.

McDonald, Ira wrote:

> Hi,
> 
> 
>>><Tony Hammond wrote...>
>>>6. Note that normalization issues are ducked. :) Probably wisely too. Not
>>>sure what the ramifications of this might be especially wrt TAG
> 
> processors
> 
>>>and %-encoding.
> 
> 
>><Tim Kindberg replied...>
>>Yes, we decided that tags that are different as strings (with same 
>>character encoding) are different, full stop. It's nice and easy to 
>>understand and there's no compelling need for a more sophisticated 
>>criterion for equality.
> 
> 
> While neither RFC 2717 nor draft RFC 2717bis address it,
> most existing URI scheme RFCs actually do identify rules for
> "comparison of two XXX URIs".  Since TAG values can be UTF-8
> (percent-encoded), there are certainly string comparison
> issues to be addressed (like underlying UTF-8 normalization
> to NFC or NFKC forms).  Using a Stringprep profile (RFC 3454) 
> is a good approach (RFC 3454).  I suggest looking at:
> 
> "Nameprep: A Stringprep Profile for Internationalized Domain Names"
> RFC 3491, March 2003
> 
> 
> Cheers,
> - Ira
> 
> Ira McDonald (Musician / Software Architect)
> Blue Roof Music / High North Inc
> PO Box 221  Grand Marais, MI  49839
> phone: +1-906-494-2434
> email: imcdonald@sharplabs.com

-- 

Tim Kindberg
hewlett-packard laboratories
filton road
stoke gifford
bristol bs34 8qz
uk

purl.org/net/TimKindberg
timothy@hpl.hp.com
voice +44 (0)117 312 9920
fax +44 (0)117 312 8003
Received on Monday, 25 October 2004 13:57:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:34 GMT