- From: Jonathan Kew <jonathan@jfkew.plus.com>
- Date: Mon, 2 Feb 2009 23:33:24 +0000
- To: fantasai <fantasai.lists@inkedblade.net>
- Cc: "Phillips, Addison" <addison@amazon.com>, Boris Zbarsky <bzbarsky@MIT.EDU>, Mark Davis <mark.davis@icu-project.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "www-style@w3.org" <www-style@w3.org>, andrewc@vicnet.net.au
On 2 Feb 2009, at 22:36, fantasai wrote: > > Phillips, Addison wrote: >> ... Both are semantically equivalent and normalize to U+00E9. I can >> send >> either to the server in my request and get the appropriate >> (normalized) >> value in return. Conversely, I should be able to select: >> <p>è</p> >> ... using either form. I might be returned the original (non- >> normalized) >> sequence in the result. The point is that processes that are >> normalization >> sensitive must behave as if the data were normalized. Why is that a >> contradiction? > > I think Boris's point is that we have a message from Andrew Cunningham > http://lists.w3.org/Archives/Public/www-style/2009Feb/0033.html > saying that form input data must not be normalized. This is > incompatible > with the idea that the browser can internally adopt NFC. I confess that I didn't really understand that message at the time. So I've just re-read it, and also looked up some MARC21-related materials. Now I'm ready to say that I disagree with this position. To quote from that message: > the normalisation of form fields should be determined the web > developer. > Normalisation in some context may violate standards in some > industries. > One taht comes to mind is libraries. Many of the newer integrated > library > management systems will use a web browser as a client for the > cataloguing > modules. Normalising form fields would result in violating the MARC21 > character model. A library cataloguing module (for example) is a specialized system that will in any case have to perform special validation/filtering on its input, if that input is provided in Unicode by the browser but must comply with the MARC21 character model when stored in the database. I don't believe, therefore, that normalization makes a significant difference to the situation. The cataloguing module can easily apply whichever form of normalization it requires, or a custom normalization-like transformation, if that helps it to process the text appropriately. > If i were working on content in some langauges like igbo, and wanted > to > include tone markers to use as an alternative display of data, its > easier > to work with NFD data and filter tone marks out when applying standard > orthographic views. True, but it is easy for the process that wants to provide alternative views of the data to pass that data through a normalization filter at that time. Again, this is a specialized application that already has detailed knowledge of the particular kind of data it is interested in, and how that is encoded; if it wants to rely on NFD representation in order to do a tone-mark-filtering operation, it should explicitly apply NFD to the data. I don't think this has any bearing on how a general-purpose web browser may or should present text to the server. > To have a browser normalise > to NFC and then have a web developer have to renormalise data to NFD > or in > the case of MARC21 build a completely new normalisation routine that > matches the MARC21 character model which is nearly but not quite NFD > is > creating a burden for the web developer in question. The web developer who is developing processes that depend on a particular normalization form, whether NFC, NFD, or some other custom transformation, must face that burden anyway. Otherwise the process will never be robustly interoperable with the wider world of encoded text. We may wish this burden didn't exist at all, but it does (and won't be going away any time soon -- Unicode is here to stay). And software developers -- rather than web page and stylesheet authors -- are the right people to carry that burden. For operations that the browser carries out, such as matching CSS selectors, the browser developer must handle it, whether up-front or on-the-fly. For operations that some back-end process carries out, such as perhaps MARC21 data validation, the developer of that process has to deal with it. JK
Received on Monday, 2 February 2009 23:35:10 UTC