Re: For review: 6 new and 2 updated articles about character encoding

Sorry for the cliffhangers. ;-) Some more proposals:

Typography: “ie. á could be represented as á”

Use <span class="qchar">á</span> (displayed in bigger font, wrapped in 
') as before in the paragraph and in the beginning of the document.

The same might apply to “single ampersand (&)” in the last paragraph.


“text in a script that uses accents or diacritics.”

Accents are a kind of diacritic. Make it: text in a script that uses 
accents or other diacritics.


It could be mentioned that 65,536 = 2^16.

“(Note that hexadecimal notation is commonly used for referring to code 
points, and will be used here.)”

That’s fine.

“For example, the letter A  in the ISO 8859-1 coded character set is in 
the 65th character position (starting from zero), and is encoded for 
representation in the computer using a byte with the value of 65.”

Oops, decimal.

When you retrieve a document from a server, the server normally sends 
some additional information with the document. This is called the HTTP 


“When a server serves (ie. sends) a document to a browser (or user agent)…”

Browsers are a kind of user agents. Make it: browser (or other user agent)

“…it also sends some additional information with the document, called 
the HTTP header.”

Is the duplication of content (see above) necessary in this place?

“HTML is an SGML-based markup language.”

It could (should?) be mentioned here that HTML5 (in HTML serialization) 
ist not SGML-based.

“that you leave a space before the '' at the end of an empty tag”

'/' missing: that you leave a space before the '/' at the end of an 
empty tag

However, this recommendation ist outdated, no current browser has 
problems with <foo/>.

“that you always use both id and name attributes for fragment identifiers”



“This is not an exhaustive list.” Fine. Is “etc.” worth a table row, then?

In the next table, it is “Etc…”

Make it the same in both tables, or remove it.

“Superscripted and subscripted characters | ¹ ² ³ ₁ ₂ ₃ | use <sup> or 
<sub> markup”

I tend to disagree here. The superscripted and subscripted characters 
carry information (x² is something different than x₂) that might get 
lost when <sup> or <sub> markup is used and text is copied without 
markup from a webpage (x<sup>2</sup> and x<sub>2</sub> both become x2; 
4<sup>2</sup> becomes 42).

And there is a typography/readability issue: The superscripted and 
subscripted characters should be readable at reasonable font sizes, 
whereas scaled-down characters (e.g. sup, sub { font-size: 0.25em }) 
might not be readable and might not fit typographically.


As pointed out, UTF-32 ist out of the game and not mentioned in “When a 
character is encoded in UTF-16, its 2 or 4 bytes can be ordered in two 
different ways ('little-endian' or 'big-endian').”

Since it’s all about UTF-16, it is confusing why UTF-16 is mentioned in 
the next sentence “The picture below illustrates this for UTF-16.”

Make it: The picture below illustrates this.

To be continued… ;-)

Received on Tuesday, 17 August 2010 10:18:38 UTC