Unicode normalization in CSS from fantasai on 2011-04-08 (www-style@w3.org from April 2011)

From: fantasai <fantasai.lists@inkedblade.net>
Date: Thu, 07 Apr 2011 17:11:20 -0700
To: "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "www-style@w3.org" <www-style@w3.org>
Message-ID: <4D9E52A8.8060806@inkedblade.net>

There was a very very very long thread on Unicode normalization in CSS
back in January/February of 2009. IIRC the conclusion was that the
problem is much bigger than CSS, and I18n had some work yet to do to
figure it all out.

Is that a correct recollection?

Daniel Glazman has been collecting outstanding issues filed against
CSS Namespaces since we now have the implementations to move to PR,
and this was one of them. But I couldn't find any conclusions to the
discussion.

I think realistically we have two options here:
   1. Nothing is normalized in CSS.
   2. CSS-internal user-defined identifiers are normalized to NFC, i.e.
        - counter names
        - namespace prefixes
        - etc.
      We already make a distinction between user-defined and CSS-defined
      names in that user-defined names are case-sensitive.
      http://www.w3.org/blog/CSS/2007/12/12/case_sensitivity

Within #2 we could
   - Normalize at "parse" time, i.e. before exposing such identifiers
     to the CSSOM.
       - In this case we need to decide whether unquoted font names are
         also affected. Probably yes.
   - Normalize at "match" time, i.e. store and expose the identifiers
     unnormalized, but define that they represent the same thing.

The third option would be to normalize the whole CSS file, but from
the discussions about interactions with XML, HTML, the DOM, etc. this
did not seem feasible, at least not without a non-lossy normalization
scheme, which Unicode currently lacks (NFC having been hijacked by the
anti-compatibility-character crusades).

So I guess the question is, what's the right way forward here?

~fantasai

Received on Friday, 8 April 2011 00:11:53 UTC