RE: Localization of XML

The document served by http://www.w3.org/ is encoded both 
in iso-8859-1 and UTF-8. This is because it is encoded 
in US-ASCII.

XML requires an processing instruction about the encoding
whenever the encoding is not UTF-8. But since the URL 
in question is encoded in UTF-8 there is no problem.
This is true even if a charset parameter is provided 
by an HTTP header. Thus, the explanation below is wrong.

Also the charset parameter in the Content-type header 
is correct. Serving the document by HTTP with a statement of 
US-ASCII or UTF-8 may confuse some browsers. So serving it with 
iso-8859-1 is both correct and bugward compatible. 

From: erik@netscape.com [mailto:erik@netscape.com]
Sent: Tuesday, February 08, 2000 4:07 AM
Subject: Re: Localization of XML

> [snip]
> I noticed that the following w3.org page uses XHTML:
>
>   http://www.w3.org/
>
> However, it doesn't start with the characters "<?xm" even though the
> charset is iso-8859-1...

This page is (currently) send with following media type:
  
  Content-Type: text/html; charset=iso-8859-1

So there is no need for an XML declaration. (Even if the
media type would be switched to text/xml, the charset parameter
of the Content-Type header would remain authoritative.)

===================================
Nir Dagan
Assistant Professor of Economics
Brown University 
Providence, RI
USA

http://www.nirdagan.com
mailto:nir@nirdagan.com
tel:+1-401-863-2145

Received on Tuesday, 8 February 2000 15:01:05 UTC