W3C home > Mailing lists > Public > www-validator@w3.org > September 2007

XML file upload issues for encoding="UTF-8"

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Thu, 13 Sep 2007 11:51:14 +0200
To: www-validator@w3.org
Message-ID: <fcb1a5$2s6$1@sea.gmane.org>

Hi, I didn't read this list for about three months, so maybe this
bug report isn't new:

When I try to validate an XML file with encoding="UTF-8" using the
file upload interface I get an error for the first non-ASCII byte.

Apparently (= reported by the validator) my browser claims to send
Content-Type: text/xml without charset.  Therefore the validator
expects US-ASCII ignoring the first input line:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

There are several problems with this:

1 - After reporting a spurious \xD3 in line 35119 the source isn't
    shown, i.e. using option "show source" has no effect.  At the
    moment I'm forced to use an editor where I don't see the line
    numbers.
2 - The reported non-ASCII char. in line 35119 is actually the last
    non-ASCII, not the first in line 626.
3 - Option "UTF-8 only if necessary" doesn't help.  Only a "hard"
    character encoding override gives me a "tentatively valid" 
    result showing the source with line numbers.
4 - Why is the encoding="UTF-8" completely ignored for text/xml ?

See http://xyzzy.webhop.info/home/ltru/4645bisU.xml (1175 KB) for
the tested file, I've used Firefox 2.x under Win XP to upload it.

Frank
Received on Thursday, 13 September 2007 09:53:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:25 GMT