- From: Jason Pouflis <pouflis@eisa.net.au>
- Date: Wed, 3 Jun 1998 09:39:41 +1000
- To: "Erik van der Poel" <erik@netscape.com>, "Aman Choudhary" <aman@asu.edu>
- Cc: <www-international@w3.org>
In developing Multilingual DNS, I came across the same problem. It is solvable, and I am available for hire. Multilingual Domain Names are also for sale. The techniques demonstrated here are proven with MSIE4 international english + extra language support, but have not yet been tested on other platforms. >> I still havent found out a way to store information, which I retrieve from >> the internet in unicode and not in ascii, which means that I can get >> information in practically any language from the internet. >> What I really want to do- >> >> | < meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" *"> >> | * - appropriate ISO code for that language >> | <input type = textbox> >> | result (CGI/ASP) >> V >> The text box value stored as UNICODE === BROWSERS >> < meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=" *"> As far as I can tell IE4 forces the page to be displayed in the specified charset, if it is available. Any character data submitted by form is in that character set as binaries or NCRs (numerical character reference data of the form &#nnnn;). However, I have not got an answer from Microsoft nor Netscape on this. (If anyone can give me the direct email for responsible people that would be nice.) === HTML FORMS Neither browser, as far as I can tell, sends the character set of encoded data, so you should include a hidden field in the form specifying the character set. eg. <input type="hidden" name="LC" value="EN.UTF-8"> or simply <input type="hidden" name="C" value="Shift-JIS">. [L = Language, C = Code] === CGIs - PERL Use Unicode::Map or Unicode::Map8 to map form data from the native character set to unicode. eg. #### code segment { $X = $cgi->param('X'); # string as characters encoded in charset $C = $cgi->param('C'); # which might be "Shift-JIS" use Unicode::Map(); $Map = new Unicode::Map({ ID => $C }); $_16bit = $Map -> to_unicode ($X); #### } code segment === DB - MySQL Then, escape any data before inserting into your database. eg. #### code segment { use Mysql; $dbh = Mysql->Connect($host,$database,$password,$user) or perror('Cannot contact database server'); my $query = 'insert into domain values ('; foreach my $field (@columns) { $query = $query . $dbh->quote($Domain{$field}) . ', '; } $query = $query . ' )' ; my $cursor = $dbh->Query( $query ) or perror("$Mysql::db_errstr Domain Creation Failed"); #### } code segment >It would be more reliable if >you indicated the charset in the HTTP Content-Type header. (I'm assuming >you're using HTTP.) >... >echo 'Content-Type: text/html; charset=gb2312' The charset tag is optional in HTTP 1.0, mandatory in HTTP 1.1. Unfortunately, a lot of communication is stuck at HTTP 1.0, meaning you will still need to put in the meta tag for content type. Cheers, Jason Pouflis pouflis@eisa.net.au e.internet pty ltd e.commerce e.business e.mail
Received on Tuesday, 2 June 1998 19:42:42 UTC