- From: Andrew Cunningham <andrewc@vicnet.net.au>
- Date: Wed, 04 Feb 2009 10:09:50 +1100
- To: Henri Sivonen <hsivonen@iki.fi>
- CC: "Phillips, Addison" <addison@amazon.com>, "L. David Baron" <dbaron@dbaron.org>, Boris Zbarsky <bzbarsky@MIT.EDU>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "www-style@w3.org" <www-style@w3.org>
- Message-ID: <4988CEBE.3090000@vicnet.net.au>
Henri Sivonen wrote: > > On Feb 2, 2009, at 21:02, Phillips, Addison wrote: > >> Because browsers are NOT the primary creator of the content. Early >> uniform normalization refers to every process that creates an >> XML/HTML/CSS/etc etc. document. The browser reads those documents and >> must still deal with normalization issues. > > > To me, it seems unreasonable to introduce serious > performance-sensitive complexity into Web content consumers to address > the case that a Web developer fails to supply HTML, CSS and JS in a > *consistent* form in terms of combining characters. (I think even > normalization in the HTML parser post-entity expansion would be > undesirable.) How big a problem is it in practice that an author fails > to be self-consistent when writing class names to .html and when > writing them to .css or .js? > You are assuming that only one developer is working on the project. When there is a team of developers it gets more murky for some languages. The operating systems being used and the keyboard layouts they are using will effect the sequence of codepoints being generated. The reality is that for some languages there are multiple layouts and input mechanisms. For a language like Vietnamese most input systems fall into one of two categories, NFC or the Microsoft format. Although if you really want too, some of those input tools could give you NFD if you wanted. last time i tested a range of Yoruba keyboard layouts and input mechanisms, i got at least four different approaches to codepoint generation. There is no guarantee that everyone on the team is using the same input software or operating system. For a language like Vietnamese I prefer my staff to use specific input tools, so we have consistency. As it is what input tools we select also effects whether we can use spell checkers with certain languages. Basically in the real world, things are messy and inconsistent. Even in the case of a single developer, if the developer uses multiple editing tools for different parts of the job, and some of those tools normalise and some don't and he is using a language where input is most likely not to be NFC -- Andrew Cunningham Senior Manager, Research and Development Vicnet State Library of Victoria 328 Swanston Street Melbourne VIC 3000 Ph: +61-3-8664-7430 Fax: +61-3-9639-2175 Email: andrewc@vicnet.net.au Alt email: lang.support@gmail.com http://home.vicnet.net.au/~andrewc/ http://www.openroad.net.au http://www.vicnet.net.au http://www.slv.vic.gov.au
Received on Tuesday, 3 February 2009 23:11:24 UTC