- From: Reuven Nisser <rnisser@ofek-liyladenu.org.il>
- Date: Tue, 23 Sep 2003 23:53:06 +0300
- To: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
- Cc: <www-html@w3.org>, "'shaula haitner'" <shaula@shaula.co.il>, "'Yuval Rabinovich'" <yuval@faz.co.il>, "'Gertel Hasson'" <gilagh@netvision.net.il>
Hello, It does not matter if I use Unicode or use encoding your way. See the following script: <body lang="en,he,ar" dir="ltr"> <p>The following are two letters in Hebrew, &05D0; &05D1; while these are three Arabic letters, &0644; &0647; &062C;. The letters forms both evolved from the ancient Aramaic alphabet. </p> </body> You can still "know" automatically which part is Arabic, which is Hebrew and which is English. So, marking the whole text as English, Hebrew and Arabic is enough. Now, using 8 bit mode: <body lang="en,he" dir="ltr"> <p>The following are two letters in Hebrew, à á </p> </body> Or using a text created with Notepad on Hebrew Windows: <body lang="en,he" dir="ltr"> <p>The following are two letters in Hebrew, à á </p> </body> Same follows. You know automatically which part is Hebrew and which is English. Regards, Reuven Nisser Ofek Liyladenu -----Original Message----- From: BIGELOW,JIM (HP-Boise,ex1) [mailto:jim.bigelow@hp.com] Sent: Tuesday, September 23, 2003 8:21 PM To: Reuven Nisser Subject: RE: Problem with LANG keyword Reuven Nisser wrote > ... > This is especially true when using Unicode. There one can mix > Hebrew, Arabic and English in the same text without any conflict. > ... The report <cite>Unicode in XML and other Markup Languages</cite> [1] discusses the many situations where markup is preferred over Unicode characters for encoding information about structure and presentation. See Section 3.9 Language Tag Characters [2]. Therefore, I think that use of the language attribute in elements that enclose spans of text from a given language is preferred over discovering the language based on the Unicode character. For example: <body lang="en" dir="ltr"> <p>The following are two letters in Hebrew, <q lang="he" dir="rtl">&05D0; &05D1;<\q> while these are three Arabic letters, <q lang="ar" dir="rtl">&0644; &0647; &062C;</q>. The letters forms both evolved from the ancient Aramaic alphabet. </p> </body> Jim Bigelow [1] http://www.w3.org/TR/unicode-xml/ [2] http://www.w3.org/TR/unicode-xml/#Language
Received on Tuesday, 23 September 2003 16:53:22 UTC