W3C home > Mailing lists > Public > www-style@w3.org > February 2009

Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization

From: Andrew Cunningham <andrewc@vicnet.net.au>
Date: Wed, 04 Feb 2009 10:09:50 +1100
Message-ID: <4988CEBE.3090000@vicnet.net.au>
To: Henri Sivonen <hsivonen@iki.fi>
CC: "Phillips, Addison" <addison@amazon.com>, "L. David Baron" <dbaron@dbaron.org>, Boris Zbarsky <bzbarsky@MIT.EDU>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "www-style@w3.org" <www-style@w3.org>


Henri Sivonen wrote:

>
> On Feb 2, 2009, at 21:02, Phillips, Addison wrote:
>
>> Because browsers are NOT the primary creator of the content. Early 
>> uniform normalization refers to every process that creates an 
>> XML/HTML/CSS/etc etc. document. The browser reads those documents and 
>> must still deal with normalization issues.
>
>
> To me, it seems unreasonable to introduce serious 
> performance-sensitive complexity into Web content consumers to address 
> the case that a Web developer fails to supply HTML, CSS and JS in a 
> *consistent* form in terms of combining characters. (I think even 
> normalization in the HTML parser post-entity expansion would be 
> undesirable.) How big a problem is it in practice that an author fails 
> to be self-consistent when writing class names to .html and when 
> writing them to .css or .js?
>
You are assuming that only one developer is working on the project.

When there is a team of developers it gets more murky for some 
languages. The operating systems being used and the keyboard layouts 
they are using  will effect the sequence of codepoints being generated. 
The reality is that for some languages there are multiple layouts and 
input mechanisms. For a language like Vietnamese most input systems fall 
into one of two categories, NFC or the Microsoft format. Although if you 
really want too, some of those input tools could give you NFD if you 
wanted.  last time i tested a range of Yoruba keyboard layouts and input 
mechanisms, i got at least four different approaches to codepoint 
generation. There is no guarantee that everyone on the team is using the 
same input software or operating system.

For a language like Vietnamese I prefer my staff to use specific input 
tools, so we have consistency.  As it is what input tools we select also 
effects whether we can use spell checkers with certain languages. 
Basically in the real world, things are messy and inconsistent.

Even in the case of a single developer, if the developer uses multiple 
editing tools for different parts of the job, and some of those tools 
normalise and some don't and he is using a language where input is most 
likely not to be NFC

-- 
Andrew Cunningham
Senior Manager, Research and Development
Vicnet
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Fax: +61-3-9639-2175

Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com

http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au


Received on Tuesday, 3 February 2009 23:11:22 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:16 GMT