- From: Richard, Francois M <Francois.M.Richard@usa.xerox.com>
- Date: Mon, 01 Oct 2001 16:45:36 -0400
- To: "'Paul Deuter'" <Paul.Deuter@plumtree.com>, Timothy Greenwood <tgreenwood@openmarket.com>, souravm <souravm@infy.com>, www-international@w3.org
- Message-id: <B08661D21F0FD311A21A00805FC7D65001EA34FA@usa0845ms1.svcdoc.mc.xerox.com>
In HTML and XML, character encoding forms and character set (= Unicode) are decoupled. As a result, in any character encoding form, it is always possible to access the whole range of Unicode characters. For instance, with iso-8859-1 encoding form, I can encode any Unicode character by using NCR like ス for the Japanese character ?. François > > Numeric character references? What are those? Every encoding can be > viewed as numbers. > A sniffer will just show you the octets that went over the > wire. It is > up to you to interpret those > octets. Therefore, if you run the test below, you must be careful to > pay attention to what chars > you enter and know how those chars are encoded in the various > encodings > that might be used. > > For example, if you type in a Japanese character, you might > want to know > how that char is > encoded in Shift-JIS, in UTF-8, and in UCS-2. Then when you > look at the > sniffer trace and you > see a certain sequence of octets, you can tell right away > what encoding > was used. > > That is important because in JSP you must know the encoding > in order to > re-interpret the bytes > properly when calling getParameter. > > -Paul > > > Paul Deuter > Internationalization Manager > Plumtree Software > paul.deuter@plumtree.com > > > -----Original Message----- > From: Timothy Greenwood [ mailto:tgreenwood@openmarket.com <mailto:tgreenwood@openmarket.com> ] > Sent: Monday, October 01, 2001 12:31 PM > To: 'souravm'; www-international@w3.org > Subject: RE: ISO-8859-1 > > > In testing our product I found that with Internet Explorer I > could enter > characters outside the declared charset. IE translated them > into numeric > character references. So everything is legal, the output > characters are > all Latin1 (ASCII even), but are correctly translated by the browser. > > Does a view source of the resulting page show NCR? > > - Tim > > > -----Original Message----- > From: souravm [ mailto:souravm@infy.com <mailto:souravm@infy.com> ] > Sent: Monday, October 01, 2001 8:42 AM > To: www-international@w3.org > Subject: ISO-8859-1 > > > > Hi , > > Here is a small jsp code which I used for proof of concept for a multi > lingual project. > > The interesting observation is that even if I put ISO-8859-1 > as charset > in the meta tag it works for all languages. I tested it for Japanese, > Korean, Arabic and French (using IME on Windows 2000). > > As far as I know ISO-8859-1 is supposed to cover only western european > languages. I'm suprised to find that it even supports the Asian > languages. > > Can anyone please explain me how can it support the Asian language ? > > Regards, > Sourav > > -------------------------------------------------------------- > ---------- > ---------------------------------- > The jsp file name is i18na.jsp > > <%@ page import="java.util.*"%> > <%@ page import="java.io.*"%> > <% > String ucStr = request.getParameter("jap"); > %> > > <HTML> > <HEAD> > <META HTTP-EQUIV="Content-Type" CONTENT="text/html; > charset=ISO-8859-1"> > <TITLE></TITLE> > > </HEAD> > <BODY topmargin="0" marginheight="0" leftmargin="0" marginwidth="0"> > String = <%= ucStr%> > <FORM name="frmText" > action=" http://192.168.119.15:5052/NASApp/fortune/i18na.jsp <http://192.168.119.15:5052/NASApp/fortune/i18na.jsp> " > method="post"> > <TABLE border="0" cellspacing="0" cellpadding="5" width="200"> > <TR> > <TD><INPUT TYPE="text" NAME="jap" SIZE="30" > value=""></TD> > <TD><INPUT TYPE="submit" NAME="Submit" > VALUE="button"></TD> > </TR> > <TR> > </TR> > > </TABLE> > > > </FORM> > </BODY> > </HTML> >
Received on Monday, 1 October 2001 17:14:11 UTC