RE: For review: Character encodings in HTML and CSS

CE Whitehead, Wed, 10 Feb 2010 17:20:04 -0500:
> Also regarding the notepad BOM, is there anyway to get that thing out 
> with an escape sequence, has anyone discovered that--
> or maybe I could take it out by re-editing the file in word at the 
> very end???
> and then saving as a utf-8 text file??

The NCR for BOM is ''. One thing is whether it would work. 
Probably not, because when you use NCRs then you don't indicate any 
encoding.  But anyhow: if you try to validate such a document, then you 
will see that it is not valid to type '' (or any other NCR) 
before the !DOCTYPE declaration. 

> OUT of CURIOSITY
> 
> Can one declare all character sets used in a document in the http header?

Did you mean "any" and not "all"? Did you mean "charset" (singular) and 
not "character sets"? 

A HTML file can only declare one encoding - referred to in HTML code 
and HTTP headers as "charset".  When you use the META element to define 
the encoding/charset (or "encoding char(aracter )set", as I would call 
it), then you are in fact using HTTP vocabulary directly in HTML - note 
the term http-equiv:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> 

So, yes, HTTP can declare any encoding charset that HTML documents 
could possibly have - which is only one per document. (Note that HTML5 
proposes "<meta charset="utf-8">" as a less HTTP-ish way to define the 
encoding charset - see Richard's article ...)

Richard, perhaps you should point out, if you haven't done so already, 
that a HTML/XML document only has one encoding.
-- 
leif halvard silli

Received on Thursday, 11 February 2010 09:53:48 UTC