W3C home > Mailing lists > Public > whatwg@whatwg.org > February 2007

[whatwg] [WA1] Specifying Character Encoding

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 1 Mar 2007 01:58:31 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0703010153170.9514@dhalsim.dreamhost.com>
On Sat, 9 Apr 2005, Lachlan Hunt wrote:
>
> In the current draft, for specifying the character encoding [1], it is 
> stated:
> 
> | In XHTML, the XML declaration should be used for inline character
> | encoding information.
> |
> | Authors should avoid including inline character encoding information.
> | Character encoding information should instead be included at the
> | transport level (e.g. using the HTTP Content-Type header).
> 
> The second paragraph should only apply to HTML using the meta element, 
> not XHTML using the XML declaration.

I don't understand why it would be ok for one and not the other.


> For X(HT)ML, according to the Architecture of the World Wide Web, Volume 
> One - Media types for XML [2]:
> [2] http://www.w3.org/TR/2004/REC-webarch-20041215/#xml-media-types
> 
> | In general, a representation provider SHOULD NOT specify the character
> | encoding for XML data in protocol headers since the data is
> | self-describing.

I personally disagree with the arguments above (transcoding proxies mean 
that the content really can't know what its content is, and therefore it 
shouldn't be saying what its encoding is). I could see an argument for 
removing the advice from the HTML5 spec altogether, though. What do you 
think?


> I think it should also be noted that authors who omit the XML 
> declaration (or include it but don't specify the encoding attribute) 
> *must* use UTF-8 or UTF-16, as described in the XML recommendation.

If you specify the HTTP headers, you could use anything, even, say, 
GSM03.38 or UTF-EBCDIC.



On Sat, 9 Apr 2005, Anne van Kesteren wrote:
> 
> Why? If people are still using text/xml for example you really want them 
> to use the HTTP Content-Type header. Otherwise its US-ASCII.

Right.


> > I think it should also be noted that authors who omit the XML 
> > declaration (or include it but don't specify the encoding attribute) 
> > *must* use UTF-8 or UTF-16, as described in the XML recommendation.
> 
> Where did you read that in the XML specification? You can always specify 
> encoding using the 'charset' parameter. That it is not recommended 
> because "webarch" things documents should be self-describing doesn't 
> matter. Also note that when the document is served using text/xml they 
> could use UTF-8 but it wouldn't work.

Exactly.


On Sat, 9 Apr 2005, Lachlan Hunt wrote:
> 
> I didn't consider text/xml because the current draft states in the 
> conformance requirements.
> 
> | XML documents [...] that are served over the wire (e.g. by HTTP) must
> | be sent using an XML MIME type such as application/xml or
> | application/xhtml+xml...
> 
> I had initially interpreted that as meaning authors must use 
> application/*+xml and must not use text/xml; however, that 
> interpretation may be incorrect. Perhaps it should be explicitly stated 
> that text/xml should not be used, with a reference to the webarch 
> recommendation.

I never did understand why people don't like text/*. It's nice and short 
and all these types are text, so...


I've made no changes to the spec, but let me know if you think something 
should change.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 28 February 2007 17:58:31 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:32 UTC