Re: New draft of What is encoding

Hi Richard,

Typo:
"Cyrillic character щ is represented by the number 1097 in the UTF-8 
encoding."

There are many  Cyrillic character щ : capital and small. Yours is the 
small (U+0449).
However, UTF-8 is 'D1 89', 1097 is the decimal code point.

Najib

Richard Ishida wrote:
> Hi Najib,
>
> Your comments caused me to add a substantial number of changes to the document, to make things clearer, and also to introduce more strongly the role of bytes in this.  See the updated wiki page.
>
> Thanks,
> RI
>
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
>  
> http://www.w3.org/International/
> http://rishida.net/blog/
> http://rishida.net/
>
>  
>  
>
>   
>> -----Original Message-----
>> From: Najib Tounsi [mailto:ntounsi@emi.ac.ma] 
>> Sent: 21 November 2007 22:36
>> To: Richard Ishida
>> Cc: public-i18n-core@w3.org
>> Subject: Re: New draft of What is encoding
>>
>> Hi Richard, all,
>>
>> Richard Ishida wrote:
>>     
>>> http://www.w3.org/International/wiki/What_is_encoding
>>>
>>> Please take a look and comment by/on Tuesday.
>>>   
>>>       
>> Here are some comments:
>>
>> Section "What's a character encoding? "
>>
>> The section is more 'why' encoding than 'What is' encoding.
>>
>> 2nd §
>> "Basically, all characters are stored in computers using a 
>> numeric code."
>> One might understand that this code is in fact the encoding. 
>> Please insist on distinction between the two.
>> e.g some thing like
>> s/are stored in computers using a numeric code./are assigned 
>> a number (numeric code) and stored in computers/
>>
>> 3rd §, 2nd sentence
>> "It is a set of mappings between numbers (ie. bytes) and characters."
>> numbers doesn't have the same meaning here. The bytes 
>> represent a given number (numeric code).
>>
>> 4th §
>> "... ie. many different ways of mapping between the same 
>> numbers and different characters."
>> True. But, as you are talking about multiple encodings of 
>> characters, you should also say that there are many ways to 
>> encode the same
>> character: for 'é' we have 223 in ISO 8859-1, two bytes in 
>> UTF-8, 16bits in UTF-16 with another value etc...
>>
>>
>> Section "What about fonts?"
>>
>> Add a sentence (after the second §) to insist that the font 
>> come AFTER 
>> encoding, i.e seeing a bad glyph (for absence of font)   is 
>> not the same 
>> as seeing a badly decoded character.
>>
>> Section  "How does this affect me?"
>>
>> 2nd §, 2nd sentence "(Note: Just declaring the encoding won't 
>> change the bytes, you need to save the text in that encoding too.)"
>> Too important to be put between parenthesis.
>>
>> I think talking about HTTP, is not really necessary, since 
>> the reader has already something to mash/eat with "What is 
>> character encoding, and why she/he should care?".
>> On the other hand, you might say it between parenthesis.
>> Or show clearly (i.e two things) that the reader should care 
>> about encoding:
>> 1- When authoring a document
>> 2- When the document is served.
>>
>> Regards, Najib
>>
>>     
>>> Thanks,
>>> RI
>>>
>>> ============
>>> Richard Ishida
>>> Internationalization Lead
>>> W3C (World Wide Web Consortium)
>>>  
>>> http://www.w3.org/International/
>>> http://rishida.net/blog/
>>> http://rishida.net/
>>>
>>>  
>>>
>>>
>>>
>>>   
>>>       
>
>
>
>   

Received on Wednesday, 28 November 2007 12:38:23 UTC