W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2009

Re: Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization

From: L. David Baron <dbaron@dbaron.org>
Date: Mon, 2 Feb 2009 12:32:13 -0800
To: Richard Ishida <ishida@w3.org>
Cc: "'Phillips, Addison'" <addison@amazon.com>, "'Boris Zbarsky'" <bzbarsky@MIT.EDU>, public-i18n-core@w3.org, www-style@w3.org
Message-ID: <20090202203213.GA25721@pickering.dbaron.org>

On Monday 2009-02-02 20:21 -0000, Richard Ishida wrote:
> Although we can't assume that text is normalized in the wild,
> because we can't control how people create that text,  when I
> talked about normalizing class names and selectors I really wasn't
> thinking of doing that each time a class name is matched to a
> selector.  Rather I was assuming that while you read in the data
> to the user agent you normalize at the same time as you convert to
> the internal encoding (which in itself is a kind of
> normalization).  If you know you have normalized the style sheet
> and the markup, you don't need to normalize at every point where
> you do a comparison.  You only have to catch those operations that
> might denormalise the data.

This seems quite reasonable to me, although I wonder if it's
*really* necessary to catch operations that might denormalize the
data.

How big a problem would it really be to have a world where it's hard
but not impossible to denormalize the data, and if you go out of
your way to do it, you'll get in trouble?  In other words, if you're
just writing markup, style sheets, or scripts, they'll get
normalized when they're read in, but if you go about messing with
escapes or doing codepoint-to-character conversions in Javascript
it's still possible to shoot yourself in the foot?

It seems like that still addresses the main use case requiring
normalization that has been brought up:  people typing text using
different editors or different operating systems.

It also provides a simple model for spec designers and software
authors -- which is thus a simple model that those who use the Web
platform in really complex ways might be able to understand.

> > > I think switching to early Uniform normalization is something that
> > > could be done in a single browser release for each browser maker.
> > 
> > Because browsers are NOT the primary creator of the content. Early uniform
> > normalization refers to every process that creates an XML/HTML/CSS/etc etc.
> > document. The browser reads those documents and must still deal with
> > normalization issues.

OK, sorry, I misunderstood what you were referring to.

-David

-- 
L. David Baron                                 http://dbaron.org/
Mozilla Corporation                       http://www.mozilla.com/
Received on Monday, 2 February 2009 20:33:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 2 February 2009 20:33:16 GMT