W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

RE: HTML5 and Unicode Normalization Form C

From: Phillips, Addison <addison@lab126.com>
Date: Tue, 31 May 2011 09:34:23 -0700
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Koji Ishii <kojiishi@gluesoft.co.jp>
CC: "www-international@w3.org" <www-international@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC3476A9336D350@EX-SEA31-D.ant.amazon.com>
> 
> No problem. And it is true that my main focus is on linking.

Linking is a special case. The IRI WG is also discussing normalization. That's the best place to deal with that issue, I think. Other comparisons in HTML (attributes and text values) do not have externally provided requirements and thus HTML (or CSS or...) need to define them.

> 
> HTML5 supports IRIs, which: [1] "Allows native representation of Unicode in
> resources without % escaping". 

While this is a general way of defining IRIs, it's also misleading. While IRIs represent the vast preponderance of Unicode code points without escaping, percent escaping is still required in a number of cases.

> 
> > But you're right that it could be a hard requirement for editors. If
> > we take it seriously, I guess we have to wait Unicode to fix NFC
> > problems (I heard the effort is going on) or to ask web
> > browsers/servers to normalize on the fly. 

Normalization is subject to Unicode's stability policy. I don't know what you think qualifies as "fixed", but it will not take the form of changing either the definition of NFC or the properties of specific characters. See: http://unicode.org/policies/stability_policy.html 

> >>
> >> As it has turned out, however, it was an error of the HTML5 validator
> >> to show an error for use of NFC. But *that* only increases the
> >> importance of offer helpful recommendations w.r.t. links.
> >
> > Thank you for the explanation of the background I wasn't aware of.
> 
> I should have pointed it out when I CC-ed this list. Sorry.

If you have concerns about links/web addresses, the best place to discuss it is on public-iri@w3.org (the IETF IRI WG's mailing list). The IRI effort needs all the help it can get.

As I mentioned before, my impression is that IRI is headed down the path of *not* requiring any particular normalization form, although NFC is recommended ("SHOULD") and early uniform normalization is explicitly assumed. Comparison of IRIs in the current draft addresses comparison by defining equivalence at the code point level. See: http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5.3.2 

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.



Received on Tuesday, 31 May 2011 16:38:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 31 May 2011 16:38:48 GMT