RE: uri templates: NFKC or NFC

> The URI Templates draft currently requires use of the NFKC for normalization
> of Unicode strings.  I've never understood why that is, considering that IRI does
> no require it and browsers appear to use NFC (if anything).  Also, it should only
> apply to the expansions -- the literal parts don't need to be normalized.
> Should I change it to NFC?

Most definitely! NFKC destroys some real semantic differences (whereas NFC is generally considered fairly benign). It could even introduce some visual oddities, such as the character U+00BC (vulgar fraction one quarter) becoming the sequence "1/4" (albeit the / is not %2F it is U+2044 FRACTION SLASH)

That said, the trend in IRI (and elsewhere) is away from mandatory normalization in processing. If IRIs do not require NFC (which I don't believe that they will or should), then having a requirement for it in URI Templates will mean that there are some IRIs that cannot be represented using a template (because the difference in the IRI is normalized away in the template).

The main reason to have normalization for templates would appear to me to be the normalization of character sequences in a variable name. It might be better to just handle sequences that don't match as not matching (e.g. the user is responsible for normalization) or perhaps referencing UAX#31 on what makes a valid identifier. Note that normalization does not eliminate the potential for problems such as combining marks to start a sequence.



Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Thursday, 14 July 2011 23:51:31 UTC