Re: Don't we need a standard way to represent language in Unicode?

I recently asked Mr. Asmus Freytag, a Microsoft employee who has been active 
on INSOFT-L, about Microsoft's position on mixed Chinese/Japanese/Korean 
text and Unicode in Windows NT.

My concern was that, since 16-bit Unicode dosn't encode language,
Windows-NT can't properly display mixed CJK language text.

Mr. Freytag pointed out that, although Microsoft is devoted to 16 bit Unicode 
for Windows NT, and will not switch to a 32 bit encoding, users can mix 
fonts in Rich Text Format documents to achieve proper display.
An NT programmer at Caltech pointed out that fonts in NT can be tagged with 
language, so language can (at least potentially) be deduced from the font 
being used, and a font can be chosen that is appropriate for a language.
I hope this will be the case in practise.

This means that Windows-NT should be able to interoperate with the 32
bit option of ISO10646, with a little work; for example, a telnet client
or newsreader could be written that always shows mixed C/J/K Han characters
in the appropriate font for the language.

The full text of Mr. Freytag's remarks follows, at his request.

(I am still curious as to whether the 32 bit option of ISO10646 will
start out as Unicode plus two bits to indicate language, e.g.
plane 00 = Unicode, plane 01 = Chinese subset of Unicode Han, plane 02 = 
Korean subset of Unicode Han, plane 03 = Japanese subset of Unicode Han.
I have not been able to join the ISO16046 mailing list yet.)

- Dan Kegel (dank@alumni.caltech.edu)

From dank
From: dank (Daniel R. Kegel)
Date: Sun, 30 Jan 1994 21:41:24 -0800
To: asmusf@microsoft.com
Subject: Windows NT and Unicode

Asmus,
in response to the question on INSOFT-L:
>| I have heard that while Unicode contains Kanji, it does so in a way 
>| that is not acceptable to the Japanese market, and hence was not
>| approved by them in recent votes.  Does this mean a product that
>| supports Unicode alone wil not be as acceptable as a product that
>| handles Japanese character sets using other encoding methods.
you wrote:
>[ If it can import and export user's documents in shift-jis,
>  it is just as good as shift-jis, so nobody should care that it's unicode. ]

This is true as far as it goes, but the primary objection to Unicode
seems to be that it doesn't provide for palatable display of mixed 
Korean, Chinese and Japanese text in the same document.  The Japanese
insist that different fonts be used for the different languages.
Is Windows NT going to be able to handle this sort of mixed language 
document?  And will it be able to do so with plain Unicode?  
I'm afraid that this isn't possible, and that something has to be done
to extend Unicode to represent language.  The Japanese hope to do this 
by using 32-bit Unicode, but since Windows NT has chosen 16-bit wchar_t,
it won't be able to go this route.  

Does this seem like a real problem to you and to Microsoft?
-Dan (dank@alumni.caltech.edu)

From asmusf@microsoft.com
From: Asmus Freytag <asmusf@microsoft.com>
To: dank@alumni.cco.caltech.edu
Date: Mon, 31 Jan 94 11:01:52 PST
Subject: RE: Windows NT and Unicode

No, this is NOT a real problem. We(MS or the vendors in Unicode) do not
think that 'plain text' solutions need that level of typographical finesse.

If you have application areas where you would like to use the 'correct' font
use formatted text solutions, i.e. 'rich text' where you carry the font 
information
separately. Unicode support (even in NT) is set up so that you can easily
extend todays rich text  technologies to use of many large Unicode encoded
fonts, e.g. one for Korean, Chinese and Japanese each. You would then,
just as you would select Times, Helv. etc. select the Japanese font for the
appropriate sections in your document (actually not THE, but A, Japanese
font, because at that level of finesse you would want to be particular about
which font is used).

A.

From dank@alumni.cco.caltech.edu
To: Asmus Freytag <asmusf@microsoft.com>
Date: Mon, 31 Jan 1994 21:38:26 -0800
From: "Daniel R. Kegel" <dank@alumni.cco.caltech.edu>

Mr. Freytag,
thanks for your quick response.  Do you mind if I summarize it to 
the net?
Thanks,
Dan

From asmusf@microsoft.com
From: Asmus Freytag <asmusf@microsoft.com>
To: dank@alumni.cco.caltech.edu
Date: Tue,  1 Feb 94 09:33:36 PST

Yes. Please send my comments out verbatim,
Thanks,
A.

From dank@alumni.cco.caltech.edu
To: Asmus Freytag <asmusf@microsoft.com>
Subject: Re: Windows NT and Unicode 
Date: Tue, 01 Feb 1994 07:16:12 -0800
From: "Daniel R. Kegel" <dank@alumni.cco.caltech.edu>

Mr. Freytag,
One more thing: the Internet community appears to be very interested
in achieving what you call typographical finesse, but what Han users call
basic readability.  The only way this affects Windows-NT is that to
convert RTF to the coming 32 bit version of the ISO version of Unicode
(for instance, to send a document via 'plain text' FTP or Usenet News),
interface software will have to read the RTF, look at the fonts used,
and decide (for Han fonts) what language the font is for.
Likewise, the software will have to look at incoming 32-bit 'unicode'
and pick a font according to language for Han language text.
Does Windows-NT provide language information about its Han fonts,
i.e. can an RTF reader deduce the language of a Han font by asking
the operating system?


From dank
From: dank (Daniel R. Kegel)
Date: Sun, 6 Feb 1994 15:35:56 -0800
To: heathh@cco.caltech.edu
Subject: Windows/NT, Unicode, and the Internet

Hi Heath,
Recently I've been really interested in how foreign text should be
represented on the Internet (because I wanted to do the right thing in
my various whois servers and clients), and joined the appropriate mailing list.
The answer appears to be (more or less) to use Unicode with a special
encoding that makes the lower 128 chars (standard ASCII) appear just
as they do now, and escapes all other chars in an efficient manner
such that the resulting strings look like normal 8 bit ASCII to
dumb software like filesystems and communications software.

The problem is in Asian languages, where it seems one needs to know
which language is being used in order to select the right font
(they are VERY picky about this over there in Han-land), and Unicode doesn't 
allow for this.  An 18 bit extended Unicode may be coming soon to handle this, 
but Windows-NT uses plain old 16 bit Unicode.

Microsoft plans to stick with 16 bit Unicode; people who want to use
the right font in mixed chinese/japanese/korean documents can bloody well
use RTF and select the right font themselves, is the official line.

I'm trying to write a summary on the issue for the mailing list.
My question to you, o Windows NT expert, is: can you deduce the language in 
use from the font?
Is there any info in NT that associates language(s) with a font?
That way, you could write an Internet news or mail client that converted to 
or from 18 bit unicode on the fly.

Thanks for any info,
   Dan K.


From foo@bar
Subject: Re: Windows/NT, Unicode, and the Internet
To: dank@alumni.cco.caltech.edu (Daniel R. Kegel)
Date: Sun, 6 Feb 1994 20:49:15 -0800 (PST)

Dan,
> 
> I'm trying to write a summary on the issue for the mailing list.
> My question to you, o Windows NT expert, is: can you deduce the language in 
> use from the font?
> Is there any info in NT that associates language(s) with a font?
> That way, you could write an Internet news or mail client that converted to 
> or from 18 bit unicode on the fly.


Heh, Windows NT expert.  I like that.   Anyway:

I take you are saying:  

- You can pick a font from the 18-bit unicode.
- Once you pick that font, you want to know what language it is.

- Better yet, based on the language the user composed the message in,
  you want to know what language the font maps to to generate 18-bit unicode.

If I have that right, I may have an answer.  Fonts are executables with
resources, much like resource DLLs.  Starting with NT, any resource
can have a language (and sublanguage) ID Associated with it.  The idea
is to make it easy to have multi-lingual dialog boxes,etc.,  Look up
EnumResourceLanguages() in api32wh.hlp.   Anyway, if you have fonts
that support this, you're in business.  If not, I dunno.

Let me know if I understood your question correctly.  (as well as if the
solution sounds reasonable.)

Later,
Heath

From dank@alumni.cco.caltech.edu
To: Asmus Freytag <asmusf@microsoft.com>
cc: "Daniel R. Kegel" <dank@alumni.cco.caltech.edu>
Subject: Re: Windows NT and Unicode 
Date: Sun, 06 Feb 1994 20:57:37 -0800
From: "Daniel R. Kegel" <dank@alumni.cco.caltech.edu>

Mr. Freytag,
a local Windows NT programmer has informed me that fonts (like all
other resources in NT) can have languages associated with them.
That should make it possible to map from RTF to Unicode extended with
language ID's.

----- end ----

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)

Received on Sunday, 6 February 1994 21:37:40 UTC