- From: Richard Ishida <ishida@w3.org>
- Date: Wed, 25 Aug 2010 18:57:03 +0100
- To: "'Gunnar Bittersmann'" <gunnar@bittersmann.de>
- Cc: <www-international@w3.org>
> From: www-international-request@w3.org [mailto:www-international-
> request@w3.org] On Behalf Of Gunnar Bittersmann
> Sent: 17 August 2010 11:19
> To: www-international@w3.org
> Subject: Re: For review: 6 new and 2 updated articles about character
> encoding
>
> Sorry for the cliffhangers. ;-) Some more proposals:
>
> http://www.w3.org/International/questions/qa-escapes.en.php#bytheway
>
> Typography: “ie. á could be represented as á”
>
> Use <span class="qchar">á</span> (displayed in bigger font, wrapped in
> ') as before in the paragraph and in the beginning of the document.
>
> The same might apply to “single ampersand (&)” in the last paragraph.
Done.
>
> ***
>
> http://www.w3.org/International/tutorials/tutorial-char-enc/#n11n
>
> “text in a script that uses accents or diacritics.”
>
> Accents are a kind of diacritic. Make it: text in a script that uses
> accents or other diacritics.
Done.
>
> ***
>
> http://www.w3.org/International/articles/definitions-characters/#unicode
>
> It could be mentioned that 65,536 = 2^16.
Done.
>
>
> http://www.w3.org/International/articles/definitions-characters/#charsets
>
> “(Note that hexadecimal notation is commonly used for referring to code
> points, and will be used here.)”
>
> That’s fine.
>
> “For example, the letter A in the ISO 8859-1 coded character set is in
> the 65th character position (starting from zero), and is encoded for
> representation in the computer using a byte with the value of 65.”
>
> Oops, decimal.
I think that's ok. I'm trying to make the link here, and the byte value is indeed 65. I'm not referring to a codepoint by name.
>
>
> http://www.w3.org/International/articles/definitions-characters/#httpheader
>
> When you retrieve a document from a server, the server normally sends
> some additional information with the document. This is called the HTTP
> header.
>
> Fine.
>
> http://www.w3.org/International/articles/definitions-characters/#mimetypes
This section has been significantly reworked, and I think the comments are now moot.
>
> “When a server serves (ie. sends) a document to a browser (or user agent)…”
>
> Browsers are a kind of user agents. Make it: browser (or other user agent)
>
> “…it also sends some additional information with the document, called
> the HTTP header.”
>
> Is the duplication of content (see above) necessary in this place?
>
>
> “HTML is an SGML-based markup language.”
>
> It could (should?) be mentioned here that HTML5 (in HTML serialization)
> ist not SGML-based.
>
>
> “that you leave a space before the '' at the end of an empty tag”
>
> '/' missing: that you leave a space before the '/' at the end of an
> empty tag
>
> However, this recommendation ist outdated, no current browser has
> problems with <foo/>.
>
> “that you always use both id and name attributes for fragment identifiers”
>
> Outdated.
>
> ***
>
> http://www.w3.org/International/questions/qa-chars-vs-markup#ok
>
> “This is not an exhaustive list.” Fine. Is “etc.” worth a table row, then?
Removed.
>
> http://www.w3.org/International/questions/qa-chars-vs-markup#compat
>
> In the next table, it is “Etc…”
>
> Make it the same in both tables, or remove it.
Removed.
>
>
> “Superscripted and subscripted characters | ¹ ² ³ ₁ ₂ ₃ | use <sup> or
> <sub> markup”
>
> I tend to disagree here. The superscripted and subscripted characters
> carry information (x² is something different than x₂) that might get
> lost when <sup> or <sub> markup is used and text is copied without
> markup from a webpage (x<sup>2</sup> and x<sub>2</sub> both
> become x2;
> 4<sup>2</sup> becomes 42).
>
> And there is a typography/readability issue: The superscripted and
> subscripted characters should be readable at reasonable font sizes,
> whereas scaled-down characters (e.g. sup, sub { font-size: 0.25em })
> might not be readable and might not fit typographically.
This is an issue that needs to be raised against the Unicode in XML document.
>
> ***
>
> http://www.w3.org/International/questions/qa-byte-order-mark#bomwhat
>
> As pointed out, UTF-32 ist out of the game and not mentioned in “When a
> character is encoded in UTF-16, its 2 or 4 bytes can be ordered in two
> different ways ('little-endian' or 'big-endian').”
>
> Since it’s all about UTF-16, it is confusing why UTF-16 is mentioned in
> the next sentence “The picture below illustrates this for UTF-16.”
>
> Make it: The picture below illustrates this.
Done.
Thanks.
RI
Received on Wednesday, 25 August 2010 17:57:37 UTC