Re: Changes to Essential definitions related to character encodings and Serving XHTML 1.0

Richard Ishida scripsit (2010-08-20 10:18+02:00):
> a new version of the document 'Serving XHTML 1.0' http://www.w3.org/International/articles/serving-xhtml/

Markup: It’s <b class="newterm"> for “HTTP header”, but <span 
class="newterm"> for the other terms. Make it 'b' for all, HTML5 style. 
(The 'dfn' element type might also be appropriate, though.)

In the previous version, “sends information” was linked to 
http://www.w3.org/International/questions/qa-headers-charset but is not 
any more. Was that intended?

Typography: “these MIME types - ie.” Use en dash: these MIME types – ie.

“They recommend, amongst other things, that you leave a space before the 
'/>' at the end of an empty tag (such as img, hr or br), that you use 
HTML's lang attribute as well as XML's xml:lang attribute, that you 
always use both id and name attributes for fragment identifiers, etc.”:
Yes, that’s what Appendix C has been saying for years. But as I’ve 
mentioned before, neither the first nor the last hint are still relevant 
for today’s browsers.

“This means that different rules are applied to the display of the file”:
Hm, it’s definitely not the scope of this article to inform the reader 
about the distinction between files and ressources. But shouldn’t the 
article use the right terminology?

You’ve added a new paragraph: “In Internet Explorer 6 nothing must 
precede the DOCTYPE declaration in a file. If any character appears 
before it, the document will be served in quirks mode.”
Just 3 paragraphs down: “ With Internet Explorer 6, however, if anything 
appears before the DOCTYPE declaration the page is rendered in quirks mode.”
Hm, duplicate content. And really anything? BOM?

“In browsers such as Internet Explorer 7, Firefox, Safari, Opera, and 
others”:
Should Chrome be explicitly mentioned?

“Since Internet Explorer 6 users may still count for a significant 
proportion of your intended audience”,
“on Internet Explorer 6 (and therefore for a potentially significant 
proportion of your audience).”:
Is there really any web site around these dasys whose intended audience 
has a significant proportion of IE 6 users? (If there is, I pity its web 
developer.)

“If you want to ensure that your pages are rendered in the same way on 
all standards-compliant browsers”:
This sound as if IE 6 was a standards-compliant browser. Ehm – nah.


> and some substantial reductions to the text in the 'MIME types' section of 'Essential definitions related to character encodings' http://www.w3.org/International/articles/definitions-characters/#mimetypes

Typography: “these MIME types - ie.” Use en dash: these MIME types – ie.


Finally, some remarks regarding 
http://www.w3.org/International/questions/qa-html-css-normalization

“a script that uses accents or diacritics.” Make it: a script that uses 
accents or other diacritics.

“There are four Normalization Forms specified by the Unicode Standard: 
NFC, NFD, NFKC and NFKD. The 'C' stands for (pre-)composed, and the 'D' 
for decomposed.”:
This raises the question what the 'K' stands for – and leaves it unanswered.

“If the word 'világ' is used in precomposed form in the HTML (eg. <span 
class="világ">), but in decomposed form in the CSS (eg. .világ { 
font-style: italic; })”:
If there is any diffenrence between the two 'á', it’s not visible. (I 
haven’t tried a hex editor.) Maybe use a CSS escape: .vila\301 g

“The best way to ensure this, especially if the HTML and the CSS files 
are authored by different people, is to use one particular Unicode 
normalization form for all authored content. As we said above, the W3C 
recommends NFC.
This is likely to be a particular issue if the markup and the CSS are 
being authored or maintained by different people.”
Duplicate content, remove the latter paragraph.


Apart from the technical POV of this article, I just happend to run into 
this trouble: Optima is a font good-looking font on Mac, but does not 
have a glyph for 'ř' as in 'Dvořák'. The browser takes a glyph from the 
next font in the font-family declaration that has a glyph for 'ř'.
Now there are three options:
(1) Ignore that there’s a patchwork. Not a good solution.
(2) Use NFC characters and a font that provides the needed glyphs. From 
a technical POV the best solution, but to refrain from using a 
good-looking font just because of some occasional characters?
(3) Use NFD characters such as 'r&#x30C;'. Technically questionable, but 
best typography.
Which one to take?

Regards,
Gunnar

Received on Thursday, 26 August 2010 15:05:19 UTC