W3C home > Mailing lists > Public > www-international@w3.org > October to December 2007

Re: For review: Character encodings for beginners

From: Najib Tounsi <ntounsi@emi.ac.ma>
Date: Tue, 11 Dec 2007 14:45:26 +0000
Message-ID: <475EA286.9080507@emi.ac.ma>
To: Richard Ishida <ishida@w3.org>
CC: www-international@w3.org

Hi Richard,

I am still unclear about définition of encoding.

"Basically, [...] A character encoding is [...] is a set of mappings 
between the bytes representing numbers in the computer and characters."

Mapping between Bytes (or codepoint) and characters

"Unfortunately, [...] many different [...] encodings, ie. many different 
ways of mapping between bytes, codepoints and characters."

Mapping between bytes and codepoint on one hand, and between codepoint 
and character on the other hand.

But, here is precisely my question. There are two levels of mappings:

Bytes <---> Code-points  <---> character-set
       (1)                (2)

What is encoding? Mapping (1), (2) or (likely) the composition of the two?

Consider Unicode encoding vs ISO-8859-x encoding.

A) In the case of ISO-8859-x serie, mapping (1) is done by 
OneByte=OneCodePoint, and mapping (2) is done by some table, depending 
on the contexte (encoding?).

223 <---> 223 <---> {é, Cyrillic Schna щ} depending on ISO-8859-{1, 5}
     (1)       (2)

mapping (2) is the "encoding" (where we have multiple choice).

B) In the case of Unicode, mapping (1) is the encoding (there are 
multiple choice) .

{"D1 89", 1097, etc.} <--->   1097    <---> Cyrillic Schna щ.
{Utf-8, Utf-16, etc.} <---> Codepoint <---> Character-set.
                       (1)             (2)

Here, mapping (2) between Codepoint and Character is One-to-One.

So, is it worth to show this two levels of mappings when talking about 

Note in passing, that Unicode encodings are good choice, because it is 
the mapping from codepoint and character which is one-to-one.


Richard Ishida wrote:





>  Comments are being sought on this article prior to final release.

>  Please send any comments to www-international@w3.org. We expect to

>  publish a final version in one to two weeks.



>  WARNING: Most of the people on this list are not the target

>  for this document.  Please bear that in mind.  The document aims to

>  provide a gentle shoe-horn for those who really have no clue about

>  character encodings, *and really don't need to know much*, at least

>  initially.


>  RI


>  ============ Richard Ishida Internationalization Lead W3C (World

>  Web Consortium)


>  http://www.w3.org/International/ http://rishida.net/blog/

>  http://rishida.net/





Received on Tuesday, 11 December 2007 14:46:01 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:55 UTC