RE: For review: 6 new and 2 updated articles about character encoding from Richard Ishida on 2010-08-25 (www-international@w3.org from July to September 2010)

From: Richard Ishida <ishida@w3.org>
Date: Wed, 25 Aug 2010 18:57:03 +0100
To: "'Gunnar Bittersmann'" <gunnar@bittersmann.de>
Cc: <www-international@w3.org>
Message-ID: <024701cb447e$ecd06700$c6713500$@org>
> From: www-international-request@w3.org [mailto:www-international-
> request@w3.org] On Behalf Of Gunnar Bittersmann
> Sent: 17 August 2010 11:19
> To: www-international@w3.org
> Subject: Re: For review: 6 new and 2 updated articles about character
> encoding
> 
> Sorry for the cliffhangers. ;-) Some more proposals:
> 
> http://www.w3.org/International/questions/qa-escapes.en.php#bytheway
> 
> Typography: “ie. á could be represented as &#xE1;”
> 
> Use <span class="qchar">á</span> (displayed in bigger font, wrapped in
> ') as before in the paragraph and in the beginning of the document.
> 
> The same might apply to “single ampersand (&)” in the last paragraph.

Done.


> 
> ***
> 
> http://www.w3.org/International/tutorials/tutorial-char-enc/#n11n
> 
> “text in a script that uses accents or diacritics.”
> 
> Accents are a kind of diacritic. Make it: text in a script that uses
> accents or other diacritics.

Done.

> 
> ***
> 
> http://www.w3.org/International/articles/definitions-characters/#unicode
> 
> It could be mentioned that 65,536 = 2^16.

Done.

> 
> 
> http://www.w3.org/International/articles/definitions-characters/#charsets
> 
> “(Note that hexadecimal notation is commonly used for referring to code
> points, and will be used here.)”
> 
> That’s fine.
> 
> “For example, the letter A  in the ISO 8859-1 coded character set is in
> the 65th character position (starting from zero), and is encoded for
> representation in the computer using a byte with the value of 65.”
> 
> Oops, decimal.

I think that's ok. I'm trying to make the link here, and the byte value is indeed 65. I'm not referring to a codepoint by name.

> 
> 
> http://www.w3.org/International/articles/definitions-characters/#httpheader
> 
> When you retrieve a document from a server, the server normally sends
> some additional information with the document. This is called the HTTP
> header.
> 
> Fine.
> 
> http://www.w3.org/International/articles/definitions-characters/#mimetypes

This section has been significantly reworked, and I think the comments are now moot.

> 
> “When a server serves (ie. sends) a document to a browser (or user agent)…”
> 
> Browsers are a kind of user agents. Make it: browser (or other user agent)
> 
> “…it also sends some additional information with the document, called
> the HTTP header.”
> 
> Is the duplication of content (see above) necessary in this place?
> 
> 
> “HTML is an SGML-based markup language.”
> 
> It could (should?) be mentioned here that HTML5 (in HTML serialization)
> ist not SGML-based.
> 
> 
> “that you leave a space before the '' at the end of an empty tag”
> 
> '/' missing: that you leave a space before the '/' at the end of an
> empty tag
> 
> However, this recommendation ist outdated, no current browser has
> problems with <foo/>.
> 
> “that you always use both id and name attributes for fragment identifiers”
> 
> Outdated.
> 
> ***
> 
> http://www.w3.org/International/questions/qa-chars-vs-markup#ok
> 
> “This is not an exhaustive list.” Fine. Is “etc.” worth a table row, then?

Removed.

> 
> http://www.w3.org/International/questions/qa-chars-vs-markup#compat
> 
> In the next table, it is “Etc…”
> 
> Make it the same in both tables, or remove it.

Removed.

> 
> 
> “Superscripted and subscripted characters | ¹ ² ³ ₁ ₂ ₃ | use <sup> or
> <sub> markup”
> 
> I tend to disagree here. The superscripted and subscripted characters
> carry information (x² is something different than x₂) that might get
> lost when <sup> or <sub> markup is used and text is copied without
> markup from a webpage (x<sup>2</sup> and x<sub>2</sub> both
> become x2;
> 4<sup>2</sup> becomes 42).
> 
> And there is a typography/readability issue: The superscripted and
> subscripted characters should be readable at reasonable font sizes,
> whereas scaled-down characters (e.g. sup, sub { font-size: 0.25em })
> might not be readable and might not fit typographically.

This is an issue that needs to be raised against the Unicode in XML document.

> 
> ***
> 
> http://www.w3.org/International/questions/qa-byte-order-mark#bomwhat
> 
> As pointed out, UTF-32 ist out of the game and not mentioned in “When a
> character is encoded in UTF-16, its 2 or 4 bytes can be ordered in two
> different ways ('little-endian' or 'big-endian').”
> 
> Since it’s all about UTF-16, it is confusing why UTF-16 is mentioned in
> the next sentence “The picture below illustrates this for UTF-16.”
> 
> Make it: The picture below illustrates this.

Done.

Thanks.
RI
Received on Wednesday, 25 August 2010 17:57:37 UTC