Re: a bug? Chinese webpage 'not xhtml"

Irene Vatton wrote:

>On Wed, 11 Feb 2004 14:40:03 +0800
>Zhang Weiwu <zhangweiwu@realss.com> wrote:
>  
>
>>This is the first time I use Amaya. It seems Amaya's I18N support is not 
>>very good. (did I miss something?)
>>
>>To repeat the bug:
>>* Open this url with Amaya 8.3: http://aliweekly.nease.net/040209.html
>>you get the "Not Well-Formed XML document - Reload as HTML or show 
>>parsing errors?"
>>* Click "show", you get something like this screenshot
>>http://aliweekly.nease.net/download/Screenshot.png
>>I cannot determine what part is wrong on this screen, there are no clear 
>>marks.
>>
>>But the URL [http://aliweekly.nease.net/040209.html], like most my other 
>>webpages, is validate.
>>http://validator.w3.org/check?uri=http%3A%2F%2Faliweekly.nease.net%2F040209.html
>>
>>Isn't the validator in Amaya the same as on http://validator.w3.org ?
>>    
>>
>
>Amaya works with unicode characters.
>We have "jisx0201", "jisx0208", "jisx0212", and gb2312" tables that convert these 
>characters into unicode characters, but we don't have a table for "gb18030" characters.
>The XML parser (expat) tries to read this gb18030 document as a utf-8 document, so it 
>stops at the first non-utf-8 character.
>I agree the parser message is not clear. We'll change the code to report the error before 
>launching the parser.
>
>By the way, could you point us to a conversion table for "gb18030" characters?
>  
>
I don't have a gb18030 conversion table, but this article provided good 
explaination and method to convert gb18030 to/from unicode;
http://www-106.ibm.com/developerworks/library/u-china.html

>Regards
>    Irene.
>-----
>Irène Vatton                     INRIA Rhône-Alpes
>INRIA                               ZIRST
>  
>
Oh you are from INRIA. Nice to meet you. Scilab is really a good tool 
from INRIA, and I'm hosting two plug-in project for Scilab 
(scilabanywhere and vrscilab on sf.net:) Do we have many people here 
from INRIA?

Received on Wednesday, 11 February 2004 03:51:50 UTC