Re: Public beta test of the W3C Markup Validator

On Wednesday, October 23, 2002, 9:53:34 AM, Terje wrote:

TB> Chris Lilley <chris@w3.org> wrote:

>>A quick test (of a well-formed but non-valid UTF-8 encoded SVG
>>document) revealed:
>>
>>    Note: The HTTP Content-Type field did not contain a "charset"
>>    attribute, but the Content-Type was one of the XML text/*
>>    sub-types. The relevant specification (RFC 3023) specifies a strong
>>    default of "us-ascii" for such documents so we will use this value
>>    regardless of any encoding you may have indicated elsewhere. If you
>>    would like to use a different encoding, you should arrange to have
>>    your server send this new encoding information.
>>
>>Firstly, that is neither desirable, nor an improvement.

TB> I think that is arguable. What's happening is that the Validator is being
TB> more strict about proper usage of the various way Character Encoding can be
TB> specified.

I agree that it is arguable, and the TAG amongst others is arguing about
it. Strictness is fine; assuming things in the absence of evidence is
not, however.


>>Plus, its arguably not true (the file was sent from local disk using
>>file upload, so its a mystery where the 'HTTP Content-type' field came
>>from or how it figured out that a 'text/*' type had been sent.

TB> Since HTTP is the only protocol supported for uploading files to the
TB> Validator, I think it's safe to assume that your browser used HTTP to
TB> submit the file. No? :-)

True, good point. I had not thought of file upload as bing a separate
HTTP transfer, but you are right. It would be interesting to know
exactly what the HTTP transfer looked like, what the headers were. Do
we have any test setup that I could upload a file to from various
browsers to see what they do?

TB> IOW, your browser submitted the file with some text/* sub type (probably
TB> text/html or text/xml), which has a strong default for us-ascii in the
TB> absense of a specific character encoding indication.

But, its an svg file and the MIME type for svg is image/svg+xml. Its
set up that way on my machine, too. ASo, where did the validator get
text/html or text/xml from?


TB> However, it may be that the weak support for file uploads in current
TB> browsers justifies special rules for files submitted via file upload.

Possibly. I would rather know more about what headers are currently
send in a file upload before arguing either for or against special
rules.

TB> I'd rather avoid having more special case rules then necessary,

Agreed

TB> but it's an avenue that could be explored if this turns out to be
TB> a problem.

TB> The best option is of course to ensure that all servers and browsers
TB> implement proper support for using HTTP Content-Type and the charset
TB> attribute correctly.

Yes; but in the case of an upload, there is no server on the
content-originating end of the transfer as it is client to server -
so, there is no server to be set up correctly.

Should the accepting server apply its own setup (eg, filename
extension to MIME type mapping) to the received content?


-- 
 Chris                            mailto:chris@w3.org

Received on Wednesday, 23 October 2002 09:46:55 UTC