W3C home > Mailing lists > Public > www-amaya@w3.org > January to March 2010

Re: Encoding and validation

From: Leif H Silli <hyperlekken@lenk.no>
Date: Mon, 15 Feb 2010 13:50:25 +0100
To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Cc: Bill Braun <bbraun@hlthsys.com>, www-amaya <www-amaya@w3.org>
Message-ID: <20100215135025150744.e9a05534@lenk.no>
Most installations of Apache comes with mod_mime and mod_negotiation, 
don they?

Using AddCharset rules, it is is simple to override/specicy the 
encoding of a particular file - it is a simple as adding a certain 
suffix , which one may define oneself. E.g. to serve a page as 
ISO-8859-1, one may add the suffix 'iso8859-1'. 
('filename.html.iso8859-1' - or more compatible with editors and file 
systems: 'filename.html.iso8859-1.html')

http://httpd.apache.org/docs/2.0/mod/mod_mime.html#addcharset


I think in most Apache installations, the suffixes are defined in the 
file 'httpd-languages.conf'.

Leif Halvard Silli

"Martin J. Dürst", Tue, 19 Jan 2010 18:49:50 +0900:
> Hello Bill,
> 
> As a general advice, setting your server so that it serves an HTTP 
> header with charset=utf-8, and then only uploading utf-8 content, and 
> streamlining all your production to utf-8, is considered a good thing 
> these days, in many if not most cases. (I do the same since for about 
> 5 years now with my own server.)
> 
> However, while such a setup is good for production, it's not good for 
> testing e.g. various different encodings. For that case, you have to 
> set up a separate server, or some specific directory of a server, and 
> mostly hand-tune the settings to make sure your tests aren't affected 
> by external factors.
> 
> Regards,    Martin.
> 
> On 2010/01/16 22:24, Bill Braun wrote:
>> Stanimir Stamenkov wrote:
>>> The XML declaration is optional but recommended:
>>> 
>>> http://www.w3.org/TR/xml/#dt-xmldecl

>>> 
>>> If your server configuration is to specify all the resources use
>>> UTF-8 encoding, then even if you omit the XML declaration but
>>> nevertheless encode your document differently (e.g. using
>>> ISO-8859-1) the browser could fail to decode it. It is a side
>>> effect of ISO-8859-1 and UTF-8 sharing the common US-ASCII base,
>>> that your document gets parsed o.k. - it just doesn't use non-ASCII
>>> characters.
>>> 
>>> If you can't change your server configuration you better save your
>>> document using UTF-8, which the server is configured to specify.
>>> The issue is not specific to XML documents - you may check whether
>>> your server is sending fixed UTF-8 for other documents, also. It is
>>> likely this problem will be most visible with XML documents because
>>> decoding errors are treated as fatal errors:
>>> 
>>> http://www.w3.org/TR/xml/#dt-fatal

>> 
>> Thank you, Stanimir. Very clear explanation, as a neophyte I was able to
>> understand the essence of it.
>> 
>> Bill Braun
>> 
>> 
>> 
>> 
> 
> -- 
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
> 
Received on Monday, 15 February 2010 12:51:04 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 April 2014 11:01:50 UTC