W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2009

Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization

From: Andrew Cunningham <andrewc@vicnet.net.au>
Date: Tue, 03 Feb 2009 12:09:09 +1100
Message-ID: <49879935.8050703@vicnet.net.au>
To: fantasai <fantasai.lists@inkedblade.net>
CC: "Phillips, Addison" <addison@amazon.com>, Boris Zbarsky <bzbarsky@MIT.EDU>, Mark Davis <mark.davis@icu-project.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "www-style@w3.org" <www-style@w3.org>


fantasai wrote:

>
> Phillips, Addison wrote:
>> ... Both are semantically equivalent and normalize to U+00E9. I can send
>> either to the server in my request and get the appropriate (normalized)
>> value in return. Conversely, I should be able to select:
>>
>> <p>&#x65;&#x300;</p>
>>
>> ... using either form. I might be returned the original (non-normalized)
>> sequence in the result. The point is that processes that are 
>> normalization
>> sensitive must behave as if the data were normalized. Why is that a
>> contradiction?
>
> I think Boris's point is that we have a message from Andrew Cunningham
>   http://lists.w3.org/Archives/Public/www-style/2009Feb/0033.html
> saying that form input data must not be normalized. This is incompatible
> with the idea that the browser can internally adopt NFC.
my thoughts have been changing as i reflect on each post.

I still feel normalisation of selectors is important, but have doubts 
about a browser normalising everything. I think the distinction is 
between content and markup.

It would be beneficial for a number of languages to have content data 
served up as NFD allowing client side manipulation of the data in 
orthographic and tonal representations, and can see accessibility 
reasons I'd want to do so. My primary concern is about:

1) web developers and designers currently carry the burden of ensuring 
that they tick off normalisation issues in their code. Most web 
developers and designers are unaware of the issues around normalisation 
and selectors, and do not have the tools to track down any bugs in their 
code that may result from canonically equivalent representations being 
used in different files or different parts of the same file. Identifying 
it as a tools issue is misleading. Its more a skills and knowledge gap 
among web developers. But could be addressed though normalisation of 
selectors.

2) in web development for lesser used languages it is beneficial and 
sometimes necessary to send content data to a web browser in a form 
other than NFC, and allow the possibility of using client side scripts 
to work on that data in various ways. That a complete normalisation 
approach that effects content, scripts as well as content would be 
potentially creating more problems. Obviously this could be done at the 
server end as well, but is it advisable to prevent such client side 
approaches?

I hope this clarifies my current ragged thoughts.

-- 
Andrew Cunningham
Senior Manager, Research and Development
Vicnet
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000

Ph: +61-3-8664-7430
Fax: +61-3-9639-2175

Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com

http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au


Received on Tuesday, 3 February 2009 01:10:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 February 2009 01:10:39 GMT