RE: Thoughts about characters transmission from Masataka Ohta on 1993-07-29 (ietf-charsets@w3.org from July to September 1993)

From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>
Date: Thu, 29 Jul 1993 12:43:14 +0900 (JST)
To: lwj@cs.kun.nl (Luc Rooijakkers)
Cc: ietf-charsets@INNOSOFT.COM
Message-id: <9307290343.AA16252@necom830.cc.titech.ac.jp>

> Masataka Ohta writes:
> 
> > For JIS, for example, Hirakana, Katakana and some frequently used
> > punctuations, at least, and some frequently used Japanese Hans (about
> > 1000, at most), optionaly, should be encoded with two octets.
> 
> Is there an easy criterium to distinguish about 1000 characters
> (preferably based on their code point), or do you have to use usage
> statistics?

There is a list of Han characters to be educated in each grade of the
elementary schools in Japan compiled by the Ministry of Education.

	grade	# of characters		cumulative percentage of use
	1	80			21
	2	160			43
	3	200			61
	4	200			73
	5	185			84
	6	181			89

The cumulative percentage is my private measurement on newspaper
articles.

I think other Han using countries should also have such lists.

						Masataka Ohta

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Wednesday, 28 July 1993 20:47:17 UTC