W3C home > Mailing lists > Public > www-international@w3.org > January to March 2010

RE: For review: Character encodings in HTML and CSS

From: Richard Ishida <ishida@w3.org>
Date: Wed, 10 Feb 2010 20:13:53 -0000
To: "'John Cowan'" <cowan@ccil.org>
Cc: <www-international@w3.org>
Message-ID: <000001caaa8d$9286d120$b7947360$@org>
Thanks for the comments, John.

> -----Original Message-----
> From: John Cowan [mailto:cowan@ccil.org]
> Sent: 09 February 2010 21:46
> To: Richard Ishida
> Cc: www-international@w3.org
> Subject: Re: For review: Character encodings in HTML and CSS
> 
> Richard Ishida scripsit:
> 
> > Comments are being sought on this article prior to final release.
> Please
> > send any comments to this list (www-international@w3.org). We expect
> > to publish a final version in one to two weeks.
> 
> I'd avoid the term "character set" altogether in favor of "character
> repertoire".

I was tempted, but I wanted to use 'character set' to encourage better
understanding of what the term really means.

> 
> I'd add that character encodings are sometimes called "charsets".

Done.

> 
> Unfortunately we are stuck with the SGML term "document character set",
> though "document coded character set" would be more correct.
> 
> You could add that coded character sets are sometimes called "code
> pages".

Done.

> 
> Since this is a tutorial, I would leave out UTF-32 altogether.
> Nobody uses UTF-32 on the web.

I think I only mention it in passing.

> 
> Third graf of "The Document Character Set": for "and a subset" read
> "and represents a subset".

Done.

> 
> In the first sentence of "Character escapes", for "an way" read "a way",
> for "the the" read "the", and omit the comma.  In the second graf,
> for "representing" read "directly representing".  In the third graf,
> add comma after "then", or else remove comma after "CSS" (either is
> fine).

Done.

> 
> For "ie." read "i.e.", and for "eg." read "e.g." throughout.

ie. and eg. are my preferred style.  It's enough that I have to use American
spelling ;-)

> 
> In "Consider using a Unicode encoding", note that plain ASCII files are
> already UTF-8.

Done.

> 
> "You may not have set the declarations that come with the HTTP header"
> doesn't make sense to me.

Changed to "You may not have control over the declarations ...

> 
> In "Character encoding names", per above, for "not the character sets"
> read "not the character repertoires or coded character sets".
> 
> For "MIME type" read "media type", or on the first use "MIME media
> type".

Done at the point where it is defined.

> 
> For "as if it was HTML" read "as HTML".

Done.

> 
> For "W3C standards interpretation" read "interpretation according to
> W3 standards", to avoid the misreading "W3C standard interpretation"
> (meaning the standard interpretation of the W3C, whatever that is).

Done

> 
> For "you get quirks" read "you get quirks mode".

Done

> 
> For "a small number of encodings" read "a few encodings".

Done

> 
> In "The XML declaration", note that if anything (even whitespace)
> precedes the XML declaration, it will not be recognized as such.
> I don't know what "(or XML protocol") means; is that an error for "(or
> XML processing instruction)"?  In any case, it should be left out.
> "XML declaration" is the only standardized , and XML declarations are
> not processing instructions in XML.
> 
> In the first graf of "The HTML5 meta charset element", omit the comma.
> 
> Given the constraints on the charset attribute of a/link/script, I'd
> leave it out of a tutorial altogether.

I was tempted, but I've been asked about it,  so I felt it needed to be
there (albeit briefly).

> 
> I'd warn against character entity references in XHTML at all.  They are
> not interoperable.

I do have a two paragraph subsection specifically related to that.

RI

> 
> --
> John Cowan   cowan@ccil.org   http://ccil.org/~cowan
> I must confess that I have very little notion of what [s. 4 of the
> British
> Trade Marks Act, 1938] is intended to convey, and particularly the
> sentence
> of 253 words, as I make them, which constitutes sub-section 1.  I doubt
> if
> the entire statute book could be successfully searched for a sentence
> of
> equal length which is of more fuliginous obscurity. --MacKinnon LJ,
> 1940
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.733 / Virus Database: 271.1.1/2679 - Release Date:
> 02/10/10 07:40:00
Received on Wednesday, 10 February 2010 20:14:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 10 February 2010 20:14:26 GMT