- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 01 Mar 2005 19:58:05 +0900
- To: MURATA Makoto <EB2M-MRT@asahi-net.or.jp>, www-forms-editor@w3.org
- Cc: Masayasu Ishikawa <mimasa@w3.mag.keio.ac.jp>
Hello Makoto, Masayasu pointed me to this mail, and I decided that it's easiest to reply directly. At 21:00 05/02/28, MURATA Makoto wrote: > Dear colleagues, > I am writing this mail on behalf of a group trying to translate > the XForms rec to Japanese and publish the translation as a JIS TS. > In E.3.1 Script Tokens of the XForms recommendation, we > find a script name "kanji". It is defined as > Subset of 'han' used in writing Japanese > However, we do not understand what is meant by this definition. Do you mean you had problems understanding the text? Or do you mean that there is no operational definition that unambiguously decides, for each Han character, whether it's in this subset or not? I'm assuming the later. > We examined relevant documents (shown below) but could not find any > definitions. > Unicode Character Database > Unicode Standard Annex #24Script Names > ISO15924 > java.lang Class Character.UnicodeBlock > If this definition cannot be clarified, we propose to drop this > script name. The XForms spec does not require an unambiguous definition of a script token. Section E.3.1 Script Tokens (http://www.w3.org/TR/xforms/sliceE.html#mode-scripts) says: >>>>>>>> However, this neither means that an input mode has to allow input for all the characters in the script or block, nor that an input mode is limited to only characters from that specific script. As an example, a "latin" keyboard doesn't cover all the characters in the Latin script, and includes punctuation which is not assigned to the Latin script. >>>>>>>> So even if the definition of 'kanji' is very fuzzy, the specification will still work. Indeed, it is important to realize that characters get added to scripts, and different keyboards and input methods support different sets of characters. For example, a mobile phone may not allow the input of the same number of characters as a PC, for the same script. I don't think this needs clarification in the spec, but in case a clarification is desired, I propose to change the first sentence above as follows: >>>> However, this neither means that an input mode has to allow input for all the characters in the script or block, nor that an input mode is limited to only characters from that specific script, nor that all of the script tokens refer to an exactly defined set of characters. >>>> If there is one thing one can criticize about the script token 'kanji', then it's that because Japanese input is mostly done via (hira)kana, the use of this script token will be very rare. I think we included it mainly to be ready just in case a different input technology for Japanese becomes more popular on some devices, e.g. handwriting input or some such, and that would benefit from distinguishing between kana and kanji input methods. But such a thing may or may not happen. So for the moment, 'kanji' is indeed not very useful, but it also doesn't hurt. Regards, Martin.
Received on Tuesday, 1 March 2005 10:58:49 UTC