W3C home > Mailing lists > Public > public-evangelist@w3.org > November 2006

Re: japanese encoding nightmare

From: Paul Arenson <paul@tokyoprogressive.org>
Date: Mon, 13 Nov 2006 23:25:45 +0900
Message-Id: <1CE9888E-FD13-4032-8701-DD2B17BB9893@tokyoprogressive.org>
Cc: Paul Arenson <paul@tokyoprogressive.org>, public-evangelist@w3.org
To: Karl Dubost <karl@w3.org>

__/__/__/__/__/__/__/__/__/__/
Paul Arenson

EMAIL
paul@tokyoprogressive.org

PHONE &VOICE MAIL
1-617-379-0761 (U.S.)
090-4173-3873 (Japan)
paularenson (Skype)
__/__/__/__/__/__/__/__/__/__/





On Nov 13, 2006, at 10:22 PM, Karl Dubost wrote:

>
> Le 13 nov. 2006 à 10:50, Paul Arenson a écrit :
>> UNSUCCESSFUL EXAMPLE (Looks ok on desktop but not on server)
>> http://tokyoprogressive.org/why.html
>>
>> CODE
>>  <meta content="text/html; charset=UTF-8" http-equiv="content-type">
>
> but this page is not in utf-8 but in shift-jis

> Either you have to save your page as utf-8 or to change the  
> encoding information to
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=Shift_JIS">


It is?  I don't recall using that.  hmmm.  And when i save to  
desktop, changing to shift jis doesn't help, nor does looking at it  
on the web. Oh well....
>
>
>
>
>> SUCCESSFUL EXAMPLE ONE (JAPANESE COMES OUT RIGHT)
>> http://www.tokyoprogressive.org/index/weblog/print/april-entries/
>
> Yes the page is rightly utf-8. not valid but utf-8
> http://validator.w3.org/check?uri=http%3A%2F% 
> 2Fwww.tokyoprogressive.org%2Findex%2Fweblog%2Fprint%2Fapril-entries%2F

Ok.....way back when i used the predecessor to Expression Engine, the  
encoding was something other than  unicode.  Then when I upgraded to  
unicode,
I asked the guy who helped me and he changed something in the program  
or on my server (using the database???). When he did that the new  
pages, like above, came out good, though old pages did not.   perhaps  
what he did to make Expression Engine work has to do with the server?

As i said, pages look good on my desktop but not on the server....



>
>> This was made via EXPRESSION ENGINE
>>
>> I note I have both  xml: lang and  uft-8.
>
> xml:lang doesn't influence the display of the page. It is there for  
> example for triggering the right accent when passing the text  
> through a vocal browser. Or to help translation engines (not sure  
> they implement it though). Or to help spelling cheker to choose the  
> right dictionary.
>
> I would recommend that you stick to utf-8, it would help to keep  
> consistency in the way you serve the pages.
>
>
>
>> I THOUGHT I did  this in UFT-8, but no.
>>  Mozilla even says it is UFT-8, but as you can see the code is  
>> western.
>> In other words, why does it work?
>
> because so browsers try to display wrong pages (invalid, wrong  
> encoding, etc.) then people who develop Web pages do not know that  
> they have done something wrong, and they do not fix it. IMHO it is  
> a mistake from browsers.
> It is cool to try to recover and display the page, but it is wrong  
> to do silent recovery, as we do not enter in a cycle which help  
> everyone to fix things and have a better experience.
>
>> SUCCESSUL EXAMPLE FOUR (most bizarre?)
>> I even forgot to add the meta tag!!!
>> http://tokyoprogressive.org/
>
> The server is sending by default an information which has usually  
> priority other the information contained in the file.
> The encoding in a file is a guess, and the browser _should_ follow  
> what the servers says.

Yes, I guess in the css?
http://tokyoprogressive.org/style.css
>
But I do not see anything there....hmmmm?


Anyway, I am a bit lost.  Is this something that the person who  
adjusted my database did when he set for Expression  Engine and it  
affects all pages on server?

How do I fix the server (it is a commercial company)...

Thanks!


>
>> Make a page in several  encodings
>> http://tokyoprogressive.org/a.html
>> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
>> <html>
>> <head>
>>   <meta content="text/html; charset=ISO-2022-JP"
>> LOOKS OK ONLINE
>
> doesn't look ok for me.
>
> but your server is configured in a strange way
>
> GET /a.html HTTP/1.1[CRLF]
> Host: tokyoprogressive.org[CRLF]
> Connection: close[CRLF]
> Accept-Encoding: gzip[CRLF]
> Accept: text/xml,application/xml,application/xhtml+xml,text/ 
> html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF]
> Accept-Language:  
> fr,en;q=0.9,ja;q=0.9,de;q=0.8,es;q=0.7,it;q=0.7,nl;q=0.6,sv;q=0.5,nb;q 
> =0.5,da;q=0.4,fi;q=0.3,pt;q=0.3,zh-Hans;q=0.2,zh-Hant;q=0.1,ko;q=0.1 
> [CRLF]
> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]
> User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: 
> 1.8.0.7) Gecko/20060911 Camino/1.0.3 Web-Sniffer/1.0.24[CRLF]
> Referer: http://web-sniffer.net/[CRLF]
> [CRLF]
>
>
> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]
>
> You serve first iso-8859-1 and then utf-8 and then anything. Maybe  
> one of the sources of your problems is there.
>
> 1. Change all your pages in one encoding only.
> 	utf-8
> 2. Change the configuration of your server to send only utf-8.
>
>
>
>
>
> -- 
> Karl Dubost - http://www.w3.org/People/karl/
> W3C Conformance Manager, QA Activity Lead
>   QA Weblog - http://www.w3.org/QA/
>      *** Be Strict To Be Cool ***
>
>
>
Received on Monday, 13 November 2006 14:27:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 15 July 2011 00:13:23 GMT