RE: ISO-8859-1 from Richard, Francois M on 2001-10-01 (www-international@w3.org from October to December 2001)

From: Richard, Francois M <Francois.M.Richard@usa.xerox.com>
Date: Mon, 01 Oct 2001 16:45:36 -0400
To: "'Paul Deuter'" <Paul.Deuter@plumtree.com>, Timothy Greenwood <tgreenwood@openmarket.com>, souravm <souravm@infy.com>, www-international@w3.org
Message-id: <B08661D21F0FD311A21A00805FC7D65001EA34FA@usa0845ms1.svcdoc.mc.xerox.com>

In HTML and XML, character encoding forms and character set (= Unicode) are
decoupled.
As a result, in any character encoding form, it is always possible to access
the whole range of Unicode characters.

For instance, with iso-8859-1 encoding form, I can encode any Unicode
character by using NCR like &#xFF7D; for the Japanese character ?.

François


>
> Numeric character references?  What are those?  Every encoding can be
> viewed as numbers. 
> A sniffer will just show you the octets that went over the
> wire.  It is
> up to you to interpret those
> octets.  Therefore, if you run the test below, you must be careful to
> pay attention to what chars
> you enter and know how those chars are encoded in the various
> encodings
> that might be used.
> 
> For example, if you type in a Japanese character, you might
> want to know
> how that char is
> encoded in Shift-JIS, in UTF-8, and in UCS-2.  Then when you
> look at the
> sniffer trace and you
> see a certain sequence of octets, you can tell right away
> what encoding
> was used.
> 
> That is important because in JSP you must know the encoding
> in order to
> re-interpret the bytes
> properly when calling getParameter.
> 
> -Paul
> 
>
> Paul Deuter
> Internationalization Manager
> Plumtree Software
> paul.deuter@plumtree.com
>  
>
> -----Original Message-----
> From: Timothy Greenwood [ mailto:tgreenwood@openmarket.com
<mailto:tgreenwood@openmarket.com> ]
> Sent: Monday, October 01, 2001 12:31 PM
> To: 'souravm'; www-international@w3.org
> Subject: RE: ISO-8859-1
>
>
> In testing our product I found that with Internet Explorer I
> could enter
> characters outside the declared charset. IE translated them
> into numeric
> character references. So everything is legal, the output
> characters are
> all Latin1 (ASCII even), but are correctly translated by the browser.
> 
> Does a view source of the resulting page show NCR?
> 
> - Tim
> 
> 
>   -----Original Message-----
> From: souravm [ mailto:souravm@infy.com <mailto:souravm@infy.com> ]
> Sent: Monday, October 01, 2001 8:42 AM
> To: www-international@w3.org
> Subject: ISO-8859-1
>
>
>
> Hi ,
> 
> Here is a small jsp code which I used for proof of concept for a multi
> lingual project.
> 
> The interesting observation is that even if I put ISO-8859-1
> as charset
> in the meta tag it works for all languages. I tested it for Japanese,
> Korean, Arabic and French (using IME on Windows 2000).
> 
> As far as I know ISO-8859-1 is supposed to cover only western european
> languages. I'm suprised to find that it even supports the Asian
> languages.
> 
> Can anyone please explain me how can it support the Asian language ?
> 
> Regards,
> Sourav
> 
> --------------------------------------------------------------
> ----------
> ----------------------------------
> The jsp file name is i18na.jsp
> 
> <%@ page import="java.util.*"%>
> <%@ page import="java.io.*"%>
> <%
>     String ucStr = request.getParameter("jap");
> %>
> 
> <HTML>
> <HEAD>
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;
> charset=ISO-8859-1">
> <TITLE></TITLE>
> 
> </HEAD>
> <BODY topmargin="0" marginheight="0" leftmargin="0" marginwidth="0">
> String  = <%= ucStr%>
> <FORM name="frmText"
> action=" http://192.168.119.15:5052/NASApp/fortune/i18na.jsp
<http://192.168.119.15:5052/NASApp/fortune/i18na.jsp> "
> method="post">
> <TABLE  border="0" cellspacing="0" cellpadding="5" width="200">
>         <TR>
>                 <TD><INPUT TYPE="text" NAME="jap" SIZE="30"
> value=""></TD>
>                 <TD><INPUT TYPE="submit" NAME="Submit"
> VALUE="button"></TD>
>         </TR>
>         <TR>
>         </TR>
> 
> </TABLE>
> 
>
> </FORM>
> </BODY>
> </HTML>
>

Received on Monday, 1 October 2001 17:14:11 UTC