Proposed replacement text for E.3.1 Script Tokens from Steven Pemberton on 2008-08-13 (public-forms@w3.org from August 2008)

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Wed, 13 Aug 2008 14:57:32 +0200
To: "Forms WG" <public-forms@w3.org>
Cc: "Martin Duerst" <duerst@it.aoyama.ac.jp>
Message-ID: <op.uftsx6htsmjzpq@acer3010>

John, Martin, Forms WG,

Here is my best try at a replacement text for the Script Tokens section.

I admit upfront: I don't understand this stuff completely, so it has  
involved some guesswork on my part.

I have doubts about the need for "hanja" and "kanji", but they are in the  
existing list, so I have left them.
I don't understand why some scripts have a property value alias and others  
not, nor if that field is the best to choose, but it seems to be the  
generator for the inital list we have, so I have copied that.
I wonder if we should include the aliases "japanese", and "korean".

The url for the ISO spec is http://unicode.org/iso15924/iso15924-codes.html

Comments?

Steven

================
E.3.1 Script Tokens

Script tokens provide a general indication of the set of characters that
is covered by an input mode. In most cases, script tokens correspond
directly to [Unicode Scripts]. However, this neither means that an
input mode has to allow input for all the characters in the script,
nor that an input mode is limited to only characters from that
specific script. As an example, a "latin" keyboard doesn't cover all
the characters in the Latin script, and includes punctuation which is
not assigned to the Latin script.

The script tokens that are allowed are listed in [ISO 15924], "codes for
the representations of scripts". The allowable values are those listed
in the column "Property Value Alias" with the underscore character
(_) removed, and excluding the two values "Common", and "Unknown". At the
time of writing, these values are:

Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese,
Buhid, CanadianAboriginal, Carian, Cherokee, Coptic, Cypriot,
Cyrillic, Devanagari, Deseret, Ethiopic, Georgian, Glagolitic, Gothic,
Greek, Gujarati, Gurmukhi, Hangul, Han, Hanunoo, Hebrew, Hiragana,
KatakanaOrHiragana, OldItalic, KayahLi, Katakana, Kharoshthi,
Khmer, Kannada, Lao, Latin, Lepcha, Limbu, LinearB, Lycian, Lydian,
Malayalam, Mongolian, Myanmar, Nko, Ogham, OlChiki, Oriya, Osmanya,
PhagsPa, Phoenician, Rejang, Runic, Saurashtra, Shavian, Sinhala,
Sundanese, SylotiNagri, Syriac, Tagbanwa, TaiLe, NewTaiLue,
Tamil, Telugu, Tifinagh, Tagalog, Thaana, Thai, Tibetan, Ugaritic,
Vai, OldPersian, Cuneiform, Yi

Seven other values are allowed:

      ipa - International Phonetic Alphabet
      hanja - Subset of 'han' used in writing Korean
      kanji - subset of 'han' used in writing Japanese
      math - mathematical symbols and related characters, representing the  
ISO 15924 code "Zmth"
      simplifiedHanzi - representing the ISO 15924 code "Hans"
      traditionalHanzi - representing the ISO 15924 code "Hant"
      user - special value denoting the 'native' input of the user
	    according to the system environment.	
================

Received on Wednesday, 13 August 2008 12:58:09 UTC