RE: Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization from Richard Ishida on 2009-02-02 (www-style@w3.org from February 2009)

From: Richard Ishida <ishida@w3.org>
Date: Mon, 2 Feb 2009 20:21:49 -0000
To: "'Phillips, Addison'" <addison@amazon.com>, "'L. David Baron'" <dbaron@dbaron.org>
Cc: "'Boris Zbarsky'" <bzbarsky@MIT.EDU>, <public-i18n-core@w3.org>, <www-style@w3.org>
Message-ID: <000901c98573$e1e982c0$a5bc8840$@org>

I think we may be  talking at cross-purposes here.  The 'early normalization' in the Charmod Norm document can refer to a situation where all content on the Web is always exposed in NFC (ie. any creator of text on the Web, no matter what keyboards, editors, form fields, string-concatenation processes, etc.etc. had been used, eventually churns out the finalised content as NFC) so that there was normally no need to normalize for normalization-sensitive operations.  However, text is lying around the Web and being input in uncontrolled ways these days, NFC, NFD, non-normalized, and so the ideal of living in a pure, normalized world is rather unrealistic.

However, that isn't to say that we can have islands within the ocean of content where early normalization takes place, so that within the sphere of that island normalization is normally not necessary.

Although we can't assume that text is normalized in the wild, because we can't control how people create that text,  when I talked about normalizing class names and selectors I really wasn't thinking of doing that each time a class name is matched to a selector.  Rather I was assuming that while you read in the data to the user agent you normalize at the same time as you convert to the internal encoding (which in itself is a kind of normalization).  If you know you have normalized the style sheet and the markup, you don't need to normalize at every point where you do a comparison.  You only have to catch those operations that might denormalise the data.

RI

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/



> -----Original Message-----
> From: public-i18n-core-request@w3.org [mailto:public-i18n-core-
> request@w3.org] On Behalf Of Phillips, Addison
> Sent: 02 February 2009 19:02
> To: L. David Baron
> Cc: Boris Zbarsky; public-i18n-core@w3.org; www-style@w3.org
> Subject: RE: Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-
> content] Unicode Normalization
> 
> > Why have you come to the conclusion that it's impossible to
> > reconcile with the current state of software?
> >
> > In this thread, you have some browser makers telling you they'd
> > vastly prefer that approach to having to consult experts on Unicode
> > normalization for every API, every property getter, etc., to
> > determine what the correct behavior is.
> >
> > I think switching to early Uniform normalization is something that
> > could be done in a single browser release for each browser maker.
> 
> Because browsers are NOT the primary creator of the content. Early uniform
> normalization refers to every process that creates an XML/HTML/CSS/etc etc.
> document. The browser reads those documents and must still deal with
> normalization issues.
> 
> >
> > Having to go through every Web-exposed API and decide on the
> > correct
> > behavior with regards to normalization is an approach that will
> > likely take decades, rely on being serialized behind the judgment
> > of
> > a very small number of people, produce a long series of decisions
> > whose internal inconsistencies will break substantive use cases,
> > and
> > still interfere significantly with the worldwide usability of any
> > software built using generic mechanisms (since you're not going to
> > teach every Web developer testing string equality in Javascript
> > which cases should use normalized-equality and which cases use
> > strict-equality).
> >
> 
> Normalization sensitive operations will still exist that require specifications to
> deal with them--or not.
> 
> Addison

Received on Monday, 2 February 2009 20:21:50 UTC