W3C home > Mailing lists > Public > public-i18n-core@w3.org > April to June 2011

RE: NFC in HTML5 (was: RE: Slots for Cyrillic Accented Vowels)

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Sun, 29 May 2011 10:57:52 -0400
To: "Michael[tm] Smith" <mike@w3.org>
CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0AC29AEA12@MAILR001.mail.lan>
> What is different about the CJK case?

There are some code points that were classified as "compatibility" in Unicode, and NFC/NFD migrates them into Unified Han code points. The problem is doing so changes glyphs slightly, in a manner that some people really cares. For instance, Adobe InDesign normalizes pasted text since CS3, and then users end up creating their own tools to paste to InDesign without NFC. As far as I understand, people who wants to push NFC in CJK would like to get away from compatibility code points and migrate them to IVS. But IVS is not ready yet, so the migration ended up by just ignoring those glyph differences.

I also heard that NFC can change glyphs for other scripts than CJK as well, I'm not sure how the glyph difference is important for those users though.

One real example is MacOS X HFS+. It uses NFD for their file names since file names must be compared, but Apple uses modified version of NFD, which exempts some punctuation (U+2000-2FFF) and CJK compatibility block (U+F900-FAFF). I don't know what the issues are for punctuation, but I think it indicates that blindly applying NFC/NFD has some issues as of today. I hope you understand that people would care on glyph differences more in HTML displayable contents than in file names.

> from what I've
> been told by others so far, there is also some value in warning for
> displayable content as well.

I'm curious to know what the value are. Can you, or someone in i18n, teach me?

As I said, if there are good enough values to validate non-normalized displayable contents, I can live with that. I'm sorry for my lack of study in this area, I just don't know the values. 


-----Original Message-----
From: Michael[tm] Smith [mailto:mike@w3.org] 
Sent: Sunday, May 29, 2011 11:09 PM
To: Koji Ishii
Cc: public-i18n-core@w3.org
Subject: Re: NFC in HTML5 (was: RE: Slots for Cyrillic Accented Vowels)

Koji Ishii <kojiishi@gluesoft.co.jp>, 2011-05-28 09:01 -0400:

> I'm still new to NFC/NFD things so I may be short to think about its side
> effects, I'm sorry in advance if that's the case, but from CJK point of
> view, I would like the warnings/errors limited only to id and class
> attributes if we were doing.

What is different about the CJK case?

> The basic idea is not to validate normalization state of any displayable
> contents.

Is that in fact the basic idea? I understand why it's more important to
report non-NFC for the values of id attributes and such, but from what I've
been told by others so far, there is also some value in warning for
displayable content as well.

> Attributes like alt in <img> can contain displayable contents.
> 
> If emitting warning is important for someone, I can live with warnings.
> But if we don't have any such situation in mind, I'd like it not to even
> warn.

I am inclined to keep having it emit the warning for now -- unless/until
anybody can point me to real-world content for which, say, the volume of
the warnings is excessive to the point where they are counterproductive.

> For instance, DAISY is required to validate their contents before
> publish. They can create a new rule to ignore NFC warnings for contents
> when moving to EPUB3, but it still help them if validator does not emit
> warnings.

They don't have to create rule to ignore the warnings; they can simply
ignore them. Warnings don't affect the validity of a document.

  --Mike

> -----Original Message-----
> From: public-i18n-core-request@w3.org [mailto:public-i18n-core-request@w3.org] On Behalf Of Michael[tm] Smith
> Sent: Saturday, May 28, 2011 2:35 PM
> To: Phillips, Addison
> Cc: public-i18n-core@w3.org
> Subject: Re: NFC in HTML5 (was: RE: Slots for Cyrillic Accented Vowels)
> 
> "Phillips, Addison" <addison@lab126.com>, 2011-05-26 09:31 -0700:
> 
> [quoting somebody]
> > > I complained that HTML5 or validator  http://validator.w3.org/
> > > *requires* NFC.
> > > This might be a bug in the validator and not actually a requirement of HTML5.
> > 
> > I believe that the W3C I18N WG does not support or think that it is a
> > good idea for HTML5 to require NFC--but I'm not aware of any normative
> > language in the HTML5 spec that requires it. This page [1] suggests that
> > the normalizer was added to the validator in response to Charmod-Norm
> > (which does not actually require NFC). If someone has a pointer to an NFC
> > requirement in HTML5, it would be most appreciated if you could forward
> > it to www-international@w3.org (or to me privately if you prefer).
> 
> For the record: There is no NFC requirement in HTML5. The fact that the
> HTML5 facet of the W3C markup validator currently reports non-NFC as a
> error is a bug in the validator.nu backend that the HTML5 facet relies on.
> 
> I'll fix that bug and push it to the W3C markup validator next week.
> 
> The fix is that the same message will be emitted but it will be a
> warning-level message instead of an error-level one -- because I'm told
> that's what would be consistent with current i18n best-practices guidance
> with regard to NFC.  But if anybody thinks the HTML5 validator should not
> emit a warning for non-NFC (in content as well as in attributes), please
> let me know.
> 
>   --Mike
> 

-- 
Michael[tm] Smith
http://people.w3.org/mike
Received on Sunday, 29 May 2011 14:57:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 29 May 2011 14:57:51 GMT