- From: Ashok Malhotra <ashokma@microsoft.com>
- Date: Tue, 28 Oct 2003 14:30:16 -0800
- To: <public-qt-comments@w3.org>
- Cc: <w3c-xml-schema-ig@w3.org>, <w3c-i18n-ig@w3.org>
- Message-ID: <EDB607C8AC991F40BE646533A1A673E877B1A0@RED-MSG-42.redmond.corp.microsoft.com>
Both Schema and the I18N WGs requested that the fn:normalize-unicode function should make it a requirement to support the W3C normalization form, now called 'fully-normalized'. See XML Schema comments item [2.6] http://lists.w3.org/Archives/Public/public-qt-comments/2003Aug/0003.html and I18N WG comments item [62] http://lists.w3.org/Archives/Public/public-qt-comments/2003Jul/0105.html . This was discussed at length at the joint WG meeting in Toronto in mid-September. It was felt that 'fully-normalized' defined a property without a well defined algorithm to achieve it. The character model says that for text to be "fully-normalized" constructs (sentences in our case) should not start with a combining character so that appending a character to a string never creates a non-normalized string. But suppose a string operation such as fn:substring results in such a string i.e. one that starts with a combining character. What should be done, should the initial combining character just be removed? (Most combining characters are accents and it is arguably rare for such characters to start a sentence but aren't 'h' and 'l' combining characters is Spanish for the forms 'ch' and 'll'? If so, many sentences will start with these characters) Michael Kay made this point in http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2003Oct/0051.html saying: "It's not at all clear to me that supporting "fully-normalized" form makes any sense at all. Whereas the Unicode normalization forms all describe an algorithm for normalizing data, the "fully-normalized" form is described only as a property of a string. There is no algorithm provided for making a string fully-normalized, and the only algorithms that one might come up with involve losing information. In my view throwing away characters in order to make the characters that remain normalized is not a useful thing to do. I think that someone has completely misunderstood the requirement here." Thus, the WGs felt that it was unclear how to implement 'fully-normalized' and so did not agree to making it a normalization form that must be supported by fn:normalize-unicode. They recommended that we wait until the Character Model becomes at Recommendation at which time, perhaps, it will become clear what this form means and how it should be implemented. All the best, Ashok
Received on Tuesday, 28 October 2003 17:30:20 UTC