W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

Re: HTML5 and Unicode Normalization Form C

From: Mark Davis ☕ <mark@macchiato.com>
Date: Tue, 31 May 2011 15:08:52 -0700
Message-ID: <BANLkTi=P5SbmXisn1b-BMOC-8uOG+Q9eEg@mail.gmail.com>
To: "Phillips, Addison" <addison@lab126.com>
Cc: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, Koji Ishii <kojiishi@gluesoft.co.jp>, "www-international@w3.org" <www-international@w3.org>
Mark

*— Il meglio è l’inimico del bene —*


On Tue, May 31, 2011 at 09:34, Phillips, Addison <addison@lab126.com> wrote:

> >
> > No problem. And it is true that my main focus is on linking.
>
> Linking is a special case. The IRI WG is also discussing normalization.
> That's the best place to deal with that issue, I think. Other comparisons in
> HTML (attributes and text values) do not have externally provided
> requirements and thus HTML (or CSS or...) need to define them.
>
> >
> > HTML5 supports IRIs, which: [1] "Allows native representation of Unicode
> in
> > resources without % escaping".
>
> While this is a general way of defining IRIs, it's also misleading. While
> IRIs represent the vast preponderance of Unicode code points without
> escaping, percent escaping is still required in a number of cases.
>
> >
> > > But you're right that it could be a hard requirement for editors. If
> > > we take it seriously, I guess we have to wait Unicode to fix NFC
> > > problems (I heard the effort is going on) or to ask web
> > > browsers/servers to normalize on the fly.
>
> Normalization is subject to Unicode's stability policy. I don't know what
> you think qualifies as "fixed", but it will not take the form of changing
> either the definition of NFC or the properties of specific characters. See:
> http://unicode.org/policies/stability_policy.html


What this might be referring to is that we are looking at the use of IVSs
for CJK compatibility characters. This would not change NFC, but would give
people a way to maintain glyphic variants across NFC. For more info on
IVS's, see http://unicode.org/ivd/, http://unicode.org/reports/tr37/.

(To make a very long story short, the CJK compatibility characters are a
small fraction of those where people want to be able to have glyphic
variants. By using IVS's instead of the CJK compatibility characters, people
can ensure that their glyphic variants are correctly encoded — and in a way
that is not affected by NFC. We're still in the process of looking at this,
so stay tuned.)


>
> > >>
> > >> As it has turned out, however, it was an error of the HTML5 validator
> > >> to show an error for use of NFC. But *that* only increases the
> > >> importance of offer helpful recommendations w.r.t. links.
> > >
> > > Thank you for the explanation of the background I wasn't aware of.
> >
> > I should have pointed it out when I CC-ed this list. Sorry.
>
> If you have concerns about links/web addresses, the best place to discuss
> it is on public-iri@w3.org (the IETF IRI WG's mailing list). The IRI
> effort needs all the help it can get.
>
> As I mentioned before, my impression is that IRI is headed down the path of
> *not* requiring any particular normalization form, although NFC is
> recommended ("SHOULD") and early uniform normalization is explicitly
> assumed. Comparison of IRIs in the current draft addresses comparison by
> defining equivalence at the code point level. See:
> http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5.3.2
>
> Addison
>
> Addison Phillips
> Globalization Architect (Lab126)
> Chair (W3C I18N WG)
>
> Internationalization is not a feature.
> It is an architecture.
>
>
>
>
Received on Tuesday, 31 May 2011 22:09:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 31 May 2011 22:09:22 GMT