Re: character encoding

On Mon, 14 Jun 2004, Juergen Kayser wrote:

> I used Netscape 7.1 for file upload and the file was validated.
> With IE 6.0 it did not work. The only way I found, that it
> works with IE 6.0 is to use the manuel override.

I checked with your document, stored in a file on my Windows 98 system
using a file name ending with .html, and submitted it to the validator
using IE 6. It complains about incorrect characters and explains that
there is a "strong default" of charset=us-ascii for text/xml, and I guess
this is what you got too.

Checking what IE 6 actually sends, I find that it really says
Content-Type: text/xml
for the file included into form data. And the consequences are then
inevitable, due to (questionable, IMHO) principles that say that US-Ascii
must then be implied, no matter what the document's content says (in XML
prolog or in <meta> tag).

> Perhaps there may be a way to validate with IE by changing the
> source?

If I remove the XML prolog

<?xml version="1.0" encoding="iso-8859-1"?>

then IE 6 sends the file as text/html, and it passes validation.
Apparently IE looks both at the file name suffix and the (first few lines
of the) file content in guessing what Content-Type should be included into
the form data.

Whether omitting the prolog is acceptable is a different matter.
I would not recommend doing so on actual Web pages that declare themselves
as ISO-8859-1 encoded XHTML documents - and using different versions of
the file for uploading to a server and for validation via the file upload
would be no easier than using the extended interface that lets you
override the charset information that the validator otherwise implies.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Monday, 14 June 2004 04:15:37 UTC