Re: Don't we need a standard way to represent language in Unicode?

On Thu, 27 Jan 1994 23:41:08 -0800, 
	"Daniel R. Kegel" <dank@alumni.cco.caltech.edu> said:
> [ietf-charsets' charter is to decide how best to represent text on
>  the Internet, now that ASCII is no longer enough for most Internet users.
>  Most members of the list seem happy with Unicode.  

This is perhaps because most of them are not CJK users or
are not considering multilingual (NOT bilingual)
environment, or were too busy to respond (just like me :-).

>  Mr. Ohta is violently opposed, and has proposed extending
> Unicode with several bits *per character* to indicate
> language.  When that was shot down, he proposed an
> extension of ISO2022 instead which completely ignores
> Unicode.  I think midway between Mr. Ohta's two proposals
> might make more sense.  -dan]

To be exact, his latter proposal (I'm the coauther) is not
an extension of ISO2022, but putting reasonable restriction
on the use of ISO2022...  hmm... `putting restriction'
itself might be regarded as an extension.

> A quick and dirty way to address the problem would be to define a set of 
> control codes as an extension of Unicode to indicate language, in much 

If such control codes are accepted universally, we'll have a
possibility to distinguish not only CJK, but also English,
French, German, ... in a text without contextual analysis of
human, and it may lead to a more powerful text processing.

But, for the moment, I'm quite negative about the
possibility of such codes being accepted.  European people
seem to have abondaned this kind of processing long ago in
return for easy displaying.

> the same way as ISO2022 defines control codes to switch character sets.

I'm now using ISO2022's switches for the alternative of such
control codes in Mule (MULtilingual Enhancement to GNU
Emacs) to distinguish CJK to supply different processing
method to each.  If mule receive a mail of UNICODE, it can't
provide such convenient functionalitiy (e.g. looking up
dictionary).

---
Ken'ichi HANDA
handa@etl.go.jp

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Friday, 28 January 1994 03:12:58 UTC