W3C home > Mailing lists > Public > www-validator@w3.org > January 2002

"Sorry, I am unable to validate this document because on line 120 it contained some byte(s)that I cannot interpret as big5"; and _other_ frustrations from an Asian user.

From: Franklen Choi <franklen@pacific.net.hk>
Date: Tue, 15 Jan 2002 03:13:01 -0500 (EST)
Message-ID: <3C43E500.24E89F46@pacific.net.hk>
To: www-validator@w3.org
Dear all,

I would like to make all the web-pages I create accessible to as many
netizens as possible. However, although there are quite a number of
tools that help web-designers to create interoperable Web page, I find
that many of these tools are deficient when Asian Characters are

Example 1: The support for double-byte characters by Amaya is still
under construction

Example 2: Tidy converts many Chinese characters, and even tags, of my
documents into "?" (The option 'raw' is already selected). (While  there
is a  binary executable version of the program which supports Asian
characters, it  is being in the state of 'unsupported'). This means I
can hardly take the advantage of Tidy when many books and web-sites
recommend it.

Example 3: W3C HTML Validation service displays the above subject
message most of the time when my documents are sent to it, when actually
the characters of these documents can be displayed properly by many
browsers I test against (e.g. lynx, opera, netscape, m$ ie,

I am not accusing anybody, but I just want to share my feelings that /it
is very frustrating for an Asian user, who is committed to write
accessible and standard-compliant web-pages, only find him/herself
unsupported, and feel 'rejected'/. This reminds me a programmer, who was
recently asked why his program cannot render unicode characters
properly, replied:

"Most computer programs use the ASCII standard to support languages that
are using the Latin character set and are limited to representing
alphabets with less
than 256 characters. One byte is sufficient to represent a single
character in such alphabets. More than one byte is required to represent
a character in languages
with more than 256 different characters or character variations. Since
traditional computer languages used to write computer programs use a
single byte to
represent a single character, changing to a system that require more
than one byte to represent a single character require major changes to
the way programs are
written and compiled.

Currently ***** does not have native support for Unicode (for displaying
characters that are not in the default Latin alphabet, such as non-ASCII
requiring more than one byte). This may change in the future, however
this is not guaranteed and a timeframe has not been established, given
that adding native
support of Unicode would require rewriting a large portion of the
program. (...) "

I really hope that internationalization is not something regarded as a

best to all,
Franklen K.S
Received on Thursday, 24 January 2002 03:25:14 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:32 UTC