W3C home > Mailing lists > Public > www-international@w3.org > January to March 2010

RE: For review: Character encodings in HTML and CSS

From: CE Whitehead <cewcathar@hotmail.com>
Date: Sat, 13 Feb 2010 16:56:17 -0500
Message-ID: <BLU109-W25BDAAD35F837606A0C16EB34C0@phx.gbl>
To: <www-international@w3.org>, <ishida@w3.org>



 


Hi!

Richard Ishida scripsit:

> Comments are being sought on this article prior to final release. Please
> send any comments to this list (www-international@w3.org). We expect
> to publish a final version in one to two weeks.

http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0100

 

 

Here are the rest of my comments on the draft, plus comments on John's!  I had a few replies to Leif, but I'll send those tomorrow or Monday!


From: John Cowan <cowan@ccil.org> 
Date: Tue, 9 Feb 2010 16:45:59 -0500

> Third graf of "The Document Character Set": for "and a subset" 
> read "and represents a subset".
??? { I don't believe I'm in agreement with John here. }
* * *
> In the first sentence of "Character escapes", for "an way" read "a way",

Agreed. 

> for "the the" read "the", and omit the comma.  
Again I agree.

> In the second graf,
> for "representing" read "directly representing".  
??? Why?
> In the third graf,
> add comma after "then", or else remove comma after "CSS" (either is fine).
Yes.
> For "ie." read "i.e.", and for "eg." read "e.g." throughout.
Agreed.


* * *
> In "Consider using a Unicode encoding", note that plain ASCII files are
already UTF-8.
So would you say it was o.k. to save text containing only ascii characters and escapes (unicode hex #'s or whatever) as ascii files while declaring the encoding to be utf-8?  This was my discussion with Leif H. S.
and I am now asking you.

> In "Character encoding names"
> . . .
> For "as if it was HTML" read "as HTML".
{This solution gets around the British preference for the indicative.}


> For "W3C standards interpretation" read "interpretation according 
> to W3 standards", to avoid the misreading "W3C standard interpretation"
> (meaning the standard interpretation of the W3C, whatever that is).
I did not have a problem with "standards interpretation"--either way would be fine with me.

> For "you get quirks" read "you get quirks mode".
Hmm.
> For "a small number of encodings" read "a few encodings".
Hmm. Picky picky why?
{ My new comments below begin here }

> In "The XML declaration", note that if anything (even whitespace)
> precedes the XML declaration, it will not be recognized as such.
> . . .
Thanks for info.

> In the first graf of "The HTML5 meta charset element", omit the comma.
Yes!

> Given the constraints on the charset attribute of a/link/script, 
> I'd leave it out of a tutorial altogether.
Hmm this info was interesting for me--the other stuff I know.
* * *
!!!
"Pros and cons of using the HTTP header for encoding declarations:  Advantages"  first bullet
"The HTTP header information has the highest priority in case of conflict, so this approach should be used by intermediate servers that transcode the data (ie. convert to a different encoding). This is sometimes done for small devices that only recognize a small number of encodings. Because the HTTP header information has precedence over any in-document declaration, it doesn't matter that transcoders typically do not change the internal encoding declarations, just the document encoding."
 
{ COMMENT:  confusing; the above paragraph is not that clear . . . at first read.

1rst, a minor problem--what is "this approach" is referring to; 'approach' should not refer to the 'header' but the method but the immediate antecedent here is the header itself . . . however I gather 'approach' is referring back to the method identified in the heading above the bullet--so o.k.; 
but all the same, you could change 'this approach' to 'the HTTP header' redundant as that sounds and no one has to look for the antecedent; 
also I'd like to start this sentence with the info about conflict;
but what is really important is:
all the information about transcoding -- except the term itself -- should be parenthetical I think because it is extraneous stuff . . .  ;
{ COMMENT what do I do with ie.  John says i.e. and I think so too; ie for me is internet explorer }
=>
"In case of conflict between internal encoding declarations and the header, the HTTP header gets priority.  Thus the HTTP approach should be used by intermediate servers that transcode data (ie. convert to a different encoding; transcoding is sometimes done for small devices that only recognize a small number of encodings). Because the HTTP header information has precedence over any in-document declaration, it doesn't matter that transcoders typically do not change the internal encoding declarations, just the document encoding."
* * *
!!!
"Pros and cons of using the HTTP header for encoding declarations:  Disadvantages" last par

"(Some people would argue that it is rarely appropriate to declare the encoding in the HTTP header if you are going to repeat it in the content of the document. In this case, they are proposing that the HTTP header say nothing about the document encoding. Note that this would usually mean taking action to disable any server defaults.)"

{ COMMENT:  subject-verb agreement
the HTTP header says; it does not say;  
again I might have a stylistic comment about the note format here;
some places it is in parens and some not; perhaps you should remove the parens here and all will be fine!}
=>

"(Some people would argue that it is rarely appropriate to declare the encoding in the HTTP header if you are going to repeat it in the content of the document. In this case, they are proposing that the HTTP header says nothing about the document encoding. Note that this would usually mean taking action to disable any server defaults.)"

{And as noted, maybe remove the parens}
* * *


{??? no

"The encoding of the document is specified just after charset=. In this case the specified encoding is the Unicode encoding, UTF-8."
{ COMMENT:  the above is o.k. no need to change here it's a type of unicode encoding; you are not saying unicode is an encoding}
} 
* * *
!!!

"The XML declaration" par 7

"Using the XML declaration for XHTML served as HTML. XHTML served as HTML is parsed as HTML, even though it is based on XML syntax, and therefore any XML declaration is not recognized by the browser. It is for this reason that you should use a Content-Type meta element to specify the encoding when serving XHTML in this way*."

{ COMMENT Sentence fragment-- should not "Using the XML declaration for XHTML served as HTML" be a header???

If it is, then you need another header at the top however, "Documents Served as XML"}

* * *
!?!
"The XML declaration" par 7, 8, 9

" It is for this reason that you should use a Content-Type meta element to specify the encoding when serving XHTML in this way*.

"* Conversely, the Content-Type meta element is not recognized by XML parsers.

" On the other hand, the file may also be used at some point as input to other processes that do use XML parsers."


{COMMENT:  I actually don't think paragraph 8 needs to be a note; it can just be part of the text.
I'd combine it with paragraph 9}


=>

"On the other hand, the Content-Type meta element is not recognized by XML parsers.  Nevertheless, the file may be used also at some point as input to other processes that do use XML parsers. . . ."

 

* * *
???
"The charset attribute on a link" par 7 second to last par

"If the author still hasn't specified the encoding of their document, you will now be asking the browser to apply an incorrect encoding."

{ COMMENT get around this change "their" to "the" because of course you don't know the gender of the author so you use the gender-neutral pronoun there
but it does not agree in number with 'author' but you can use an article instead

I personally don't like sentences where the pronoun does not match in number what it refers to even though these are common enough in speech
and politically correct (which it's good and safe to be of course);
you may get other opinions on this; but
I always try to get around using the 3rd person singular generic pronoun
or I use 'he or she'}


=>  

"If the author still hasn't specified the encoding of the document, you will now be asking the browser to apply an incorrect encoding."

{John noted that:
> In "The XML declaration", note that if anything (even whitespace)
> precedes the XML declaration, it will not be recognized as such.
> . . .

John also had some commens about 'The HTML5 charset meta element'}

 

* * *

"Also bear in mind... :  Hex vs. decimal."  last sentence

"You do not need to use leading zeros in escapes, ie.  could be represented as &#xE1;."

{ COMMENT:  again, you've used "ie." and I do tend to think it is only 'i.e'}

Best,

C. E. Whitehead
cewcathar@hotmail.com


 


From: cewcathar@hotmail.com
To: www-international@w3.org; ishida@w3.org
Date: Thu, 11 Feb 2010 21:28:31 -0500
Subject: RE: For review: Character encodings in HTML and CSS



Hi--
I read a bit more through the draft; one proofreading comment I had seemed pretty important; the rest will have to wait till hopefully sometime tomorrow!

Richard Ishida scripsit:
> Comments are being sought on this article prior to final release. Please
> send any comments to this list (www-international@w3.org). We expect
> to publish a final version in one to two weeks.
http://www.w3.org/International/tutorials/tutorial-char-enc/temp#Slide0100

{I'm using the same key as before so !!! means it needs fixing for sure in my opinion.|



* * *
!!!
"Pros and cons of using the HTTP header for encoding declarations:  Advantages"  first bullet
"The HTTP header information has the highest priority in case of conflict, so this approach should be used by intermediate servers that transcode the data (ie. convert to a different encoding). This is sometimes done for small devices that only recognize a small number of encodings. Because the HTTP header information has precedence over any in-document declaration, it doesn't matter that transcoders typically do not change the internal encoding declarations, just the document encoding."
 
{ COMMENT:  confusing; the above paragraph is not that clear . . . at first read.

1rst, a minor problem--what is "this approach" is referring to; 'approach' should not refer to the 'header' but the method but the immediate antecedent here is the header itself . . . however I gather 'approach' is referring back to the method identified in the heading above the bullet--so o.k.; 
but all the same, you could change 'this approach' to 'the HTTP header' redundant as that sounds and no one has to look for the antecedent; 
also I'd like to start this sentence with the info about conflict;
but what is really important is:
all the information about transcoding -- except the term itself -- should be parenthetical I think because it is extraneous stuff . . .  ;
{ COMMENT what do I do with ie.  John says i.e. and I think so too; ie for me is internet explorer }
=>
"In case of conflict between internal encoding declarations and the header, the HTTP header gets priority.  Thus the HTTP approach should be used by intermediate servers that transcode data (ie. convert to a different encoding; transcoding is sometimes done for small devices that only recognize a small number of encodings). Because the HTTP header information has precedence over any in-document declaration, it doesn't matter that transcoders typically do not change the internal encoding declarations, just the document encoding."
* * *
!!!
"Pros and cons of using the HTTP header for encoding declarations:  Disadvantages" last par

"(Some people would argue that it is rarely appropriate to declare the encoding in the HTTP header if you are going to repeat it in the content of the document. In this case, they are proposing that the HTTP header say nothing about the document encoding. Note that this would usually mean taking action to disable any server defaults.)"

{ COMMENT:  subject-verb agreement
the HTTP header says; it does not say;  
again I might have a stylistic comment about the note format here;
some places it is in parens and some not; perhaps you should remove the parens here and all will be fine!}
=>

"(Some people would argue that it is rarely appropriate to declare the encoding in the HTTP header if you are going to repeat it in the content of the document. In this case, they are proposing that the HTTP header says nothing about the document encoding. Note that this would usually mean taking action to disable any server defaults.)"

{And as noted, maybe remove the parens}
* * *
 
Best,
 
C. E. Whitehead
cewcathar@hotmail.com


 		 	   		  
Received on Saturday, 13 February 2010 21:56:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 13 February 2010 21:56:56 GMT