- From: Najib Tounsi <ntounsi@emi.ac.ma>
- Date: Wed, 28 Nov 2007 12:38:12 +0000
- To: Richard Ishida <ishida@w3.org>
- CC: public-i18n-core@w3.org
Hi Richard, Typo: "Cyrillic character щ is represented by the number 1097 in the UTF-8 encoding." There are many Cyrillic character щ : capital and small. Yours is the small (U+0449). However, UTF-8 is 'D1 89', 1097 is the decimal code point. Najib Richard Ishida wrote: > Hi Najib, > > Your comments caused me to add a substantial number of changes to the document, to make things clearer, and also to introduce more strongly the role of bytes in this. See the updated wiki page. > > Thanks, > RI > > ============ > Richard Ishida > Internationalization Lead > W3C (World Wide Web Consortium) > > http://www.w3.org/International/ > http://rishida.net/blog/ > http://rishida.net/ > > > > > >> -----Original Message----- >> From: Najib Tounsi [mailto:ntounsi@emi.ac.ma] >> Sent: 21 November 2007 22:36 >> To: Richard Ishida >> Cc: public-i18n-core@w3.org >> Subject: Re: New draft of What is encoding >> >> Hi Richard, all, >> >> Richard Ishida wrote: >> >>> http://www.w3.org/International/wiki/What_is_encoding >>> >>> Please take a look and comment by/on Tuesday. >>> >>> >> Here are some comments: >> >> Section "What's a character encoding? " >> >> The section is more 'why' encoding than 'What is' encoding. >> >> 2nd § >> "Basically, all characters are stored in computers using a >> numeric code." >> One might understand that this code is in fact the encoding. >> Please insist on distinction between the two. >> e.g some thing like >> s/are stored in computers using a numeric code./are assigned >> a number (numeric code) and stored in computers/ >> >> 3rd §, 2nd sentence >> "It is a set of mappings between numbers (ie. bytes) and characters." >> numbers doesn't have the same meaning here. The bytes >> represent a given number (numeric code). >> >> 4th § >> "... ie. many different ways of mapping between the same >> numbers and different characters." >> True. But, as you are talking about multiple encodings of >> characters, you should also say that there are many ways to >> encode the same >> character: for 'é' we have 223 in ISO 8859-1, two bytes in >> UTF-8, 16bits in UTF-16 with another value etc... >> >> >> Section "What about fonts?" >> >> Add a sentence (after the second §) to insist that the font >> come AFTER >> encoding, i.e seeing a bad glyph (for absence of font) is >> not the same >> as seeing a badly decoded character. >> >> Section "How does this affect me?" >> >> 2nd §, 2nd sentence "(Note: Just declaring the encoding won't >> change the bytes, you need to save the text in that encoding too.)" >> Too important to be put between parenthesis. >> >> I think talking about HTTP, is not really necessary, since >> the reader has already something to mash/eat with "What is >> character encoding, and why she/he should care?". >> On the other hand, you might say it between parenthesis. >> Or show clearly (i.e two things) that the reader should care >> about encoding: >> 1- When authoring a document >> 2- When the document is served. >> >> Regards, Najib >> >> >>> Thanks, >>> RI >>> >>> ============ >>> Richard Ishida >>> Internationalization Lead >>> W3C (World Wide Web Consortium) >>> >>> http://www.w3.org/International/ >>> http://rishida.net/blog/ >>> http://rishida.net/ >>> >>> >>> >>> >>> >>> >>> > > > >
Received on Wednesday, 28 November 2007 12:38:23 UTC