W3C home > Mailing lists > Public > www-international@w3.org > January to March 2000

Re: [nelocsig] Re: International Search Engine Submission

From: Stuart Woodward <stuart@gol.com>
Date: Tue, 15 Feb 2000 16:49:39 +0900
Message-ID: <004e01bf7789$3944fa70$1700000a@vgkk.co.jp>
To: <nelocsig@egroups.com>, "www" <www-international@w3.org>
> Could you please explain the difference between "hankaku" and "zenkaku".

In Shift JIS (and "Uni"code!) for katakana (phonetic) characters (&
alphanumerics) there are two *different* character codes which represent the
same chararcter. E.g. the word te-ri-bi (television) can be written in
either hankaku (han=half width, single byte) or zenkaku (zen=full width,
double byte) katakana. This is a holdover
from the hardware word processor world which could only print in two sizes.

So, if you search for "teribi" in half width characters you may not get any
hits for pages which wrote it in full width characters even though to the
reader they are the same word. It's bit like if a search engine was case
sensitive.
Some search engines do the conversion for you, some don't.

See also:

http://cns-web.bu.edu/pub/djohnson/web_files/i18n/japanese.html
Received on Tuesday, 15 February 2000 02:45:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT