W3C home > Mailing lists > Public > www-validator@w3.org > June 2004

Re: character encoding

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Mon, 14 Jun 2004 11:15:35 +0300 (EEST)
To: www-validator@w3.org
Message-ID: <Pine.GSO.4.58.0406141034520.18008@korppi.cs.tut.fi>

On Mon, 14 Jun 2004, Juergen Kayser wrote:

> I used Netscape 7.1 for file upload and the file was validated.
> With IE 6.0 it did not work. The only way I found, that it
> works with IE 6.0 is to use the manuel override.

I checked with your document, stored in a file on my Windows 98 system
using a file name ending with .html, and submitted it to the validator
using IE 6. It complains about incorrect characters and explains that
there is a "strong default" of charset=us-ascii for text/xml, and I guess
this is what you got too.

Checking what IE 6 actually sends, I find that it really says
Content-Type: text/xml
for the file included into form data. And the consequences are then
inevitable, due to (questionable, IMHO) principles that say that US-Ascii
must then be implied, no matter what the document's content says (in XML
prolog or in <meta> tag).

> Perhaps there may be a way to validate with IE by changing the
> source?

If I remove the XML prolog

<?xml version="1.0" encoding="iso-8859-1"?>

then IE 6 sends the file as text/html, and it passes validation.
Apparently IE looks both at the file name suffix and the (first few lines
of the) file content in guessing what Content-Type should be included into
the form data.

Whether omitting the prolog is acceptable is a different matter.
I would not recommend doing so on actual Web pages that declare themselves
as ISO-8859-1 encoded XHTML documents - and using different versions of
the file for uploading to a server and for validation via the file upload
would be no easier than using the extended interface that lets you
override the charset information that the validator otherwise implies.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Monday, 14 June 2004 04:15:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:14 GMT