XML file upload issues for encoding="UTF-8"

Hi, I didn't read this list for about three months, so maybe this
bug report isn't new:

When I try to validate an XML file with encoding="UTF-8" using the
file upload interface I get an error for the first non-ASCII byte.

Apparently (= reported by the validator) my browser claims to send
Content-Type: text/xml without charset.  Therefore the validator
expects US-ASCII ignoring the first input line:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

There are several problems with this:

1 - After reporting a spurious \xD3 in line 35119 the source isn't
    shown, i.e. using option "show source" has no effect.  At the
    moment I'm forced to use an editor where I don't see the line
    numbers.
2 - The reported non-ASCII char. in line 35119 is actually the last
    non-ASCII, not the first in line 626.
3 - Option "UTF-8 only if necessary" doesn't help.  Only a "hard"
    character encoding override gives me a "tentatively valid" 
    result showing the source with line numbers.
4 - Why is the encoding="UTF-8" completely ignored for text/xml ?

See http://xyzzy.webhop.info/home/ltru/4645bisU.xml (1175 KB) for
the tested file, I've used Firefox 2.x under Win XP to upload it.

Frank

Received on Thursday, 13 September 2007 09:53:30 UTC