RE: ISO-8859-1 from Paul Deuter on 2001-10-01 (www-international@w3.org from October to December 2001)

From: Paul Deuter <Paul.Deuter@plumtree.com>
Date: Mon, 1 Oct 2001 12:47:44 -0700
To: "Timothy Greenwood" <tgreenwood@openmarket.com>, "souravm" <souravm@infy.com>, <www-international@w3.org>
Message-ID: <C7F00D7948B8E4468BB330152C6BA4E00AACEC@cstaex03.USIPLUMTREE.AD>

Numeric character references?  What are those?  Every encoding can be
viewed as numbers.  
A sniffer will just show you the octets that went over the wire.  It is
up to you to interpret those
octets.  Therefore, if you run the test below, you must be careful to
pay attention to what chars
you enter and know how those chars are encoded in the various encodings
that might be used.
 
For example, if you type in a Japanese character, you might want to know
how that char is
encoded in Shift-JIS, in UTF-8, and in UCS-2.  Then when you look at the
sniffer trace and you
see a certain sequence of octets, you can tell right away what encoding
was used.
 
That is important because in JSP you must know the encoding in order to
re-interpret the bytes
properly when calling getParameter.
 
-Paul
 

Paul Deuter 
Internationalization Manager 
Plumtree Software 
paul.deuter@plumtree.com 
  

-----Original Message-----
From: Timothy Greenwood [mailto:tgreenwood@openmarket.com]
Sent: Monday, October 01, 2001 12:31 PM
To: 'souravm'; www-international@w3.org
Subject: RE: ISO-8859-1


In testing our product I found that with Internet Explorer I could enter
characters outside the declared charset. IE translated them into numeric
character references. So everything is legal, the output characters are
all Latin1 (ASCII even), but are correctly translated by the browser. 
 
Does a view source of the resulting page show NCR?
 
- Tim
 
 
  -----Original Message-----
From: souravm [mailto:souravm@infy.com]
Sent: Monday, October 01, 2001 8:42 AM
To: www-international@w3.org
Subject: ISO-8859-1



Hi ,
 
Here is a small jsp code which I used for proof of concept for a multi
lingual project.
 
The interesting observation is that even if I put ISO-8859-1 as charset
in the meta tag it works for all languages. I tested it for Japanese,
Korean, Arabic and French (using IME on Windows 2000).
 
As far as I know ISO-8859-1 is supposed to cover only western european
languages. I'm suprised to find that it even supports the Asian
languages.
 
Can anyone please explain me how can it support the Asian language ?
 
Regards,
Sourav
 
------------------------------------------------------------------------
----------------------------------
The jsp file name is i18na.jsp
 
<%@ page import="java.util.*"%>
<%@ page import="java.io.*"%>
<%
    String ucStr = request.getParameter("jap");
%>
 
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<TITLE></TITLE>
 
</HEAD>
<BODY topmargin="0" marginheight="0" leftmargin="0" marginwidth="0">
String  = <%= ucStr%>
<FORM name="frmText"
action="http://192.168.119.15:5052/NASApp/fortune/i18na.jsp"
method="post">
<TABLE  border="0" cellspacing="0" cellpadding="5" width="200">
        <TR>
                <TD><INPUT TYPE="text" NAME="jap" SIZE="30"
value=""></TD>
                <TD><INPUT TYPE="submit" NAME="Submit"
VALUE="button"></TD>
        </TR>
        <TR>
        </TR>
 
</TABLE>
 

</FORM>
</BODY>
</HTML>

Received on Monday, 1 October 2001 15:47:07 UTC