W3C home > Mailing lists > Public > www-international@w3.org > July to September 2000

Re: Query on character sets

From: Martin J. Duerst <duerst@w3.org>
Date: Tue, 26 Sep 2000 13:43:46 +0900
Message-Id: <>
To: Misha Wolf <misha.wolf@reuters.com>, www-international@w3.org, simonleung@hyperoffice.com
Hello Simon,

I think you have to distinguish between subsets respective to the
character repertoire and subsets respective to the character encoding.

As an example, the repertoire (set of characters) that can be represented
by Big5 is a subset of the repertoire of UTF-8. You can therefore convert
a file from Big5 to UTF-8 without loosing any characters.

On the other hand, the Big5 encoding is completely different from the
UTF-8 encoding, and if you try to decode a Big5 file as UTF-8, you
may see garbage, but you actually should get an error message.

Hope this helps.    Regards,   Martin.

At 00/09/25 14:00 +0000, Misha Wolf wrote:
>Please respond to the questions below, copying both the list and
>[This mail was written using voice recognition software]
> > Dear sir,
> >
> > I've got a question regarding the page
> > 'http://www.unicode.org/iuc/iuc10/languages.html'.
> >
> > Please advise if the following statement are right or not.
> >
> > 1. Every page coded in ASCII can be viewed by the browers using different
> > encoding scheme since ASCII is the subset of all the character set.
> >
> > 2. So, does it mean that if page encoded using encoding scheme 'A' can 
> be viewed
> > properly by the browers using encoding scheme 'B'if 'A' is a subset of 
> 'B' ?
> >
> >
> > However, when i use UTF-8 to decode the page which use the charset 
> 'BIG5', i can
> > only observe the garbage.
> >
> > Many thanks,
> > Simon
>         Visit our Internet site at http://www.reuters.com
>Any views expressed in this message are those of  the  individual
>sender,  except  where  the sender specifically states them to be
>the views of Reuters Ltd.
Received on Tuesday, 26 September 2000 00:56:26 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT