Re: Public beta test of the W3C Markup Validator

Martin Duerst <duerst@w3.org> wrote:

>The current message is confusing for somebody using file
>upload. There is probably a better way to explain what happens.

The current code in CVS now reads:

  if ($File->{'Is Upload'}) {
    my @_source = ('sent by your web browser', 'browser send');
  } else {
    my @_source = ('returned by your web server', 'server return');
  }
  &add_warning($File, <<".EOF.");
      <em>Note:</em>
      The HTTP Content-Type field $_source[0] did not contain
      a "charset" attribute, but the Content-Type was one of the XML text/*
      sub-types (<code>$File->{ContentType}</code>). The relevant
      specification (RFC 3023) specifies a strong default of "us-ascii" for
      such documents so we will use this value regardless of any encoding
      you may have indicated elsewhere. If you would like to use a
      different encoding, you should arrange to have your $_source[1] this
      new encoding information.
.EOF.

IOW, in the case you reported it would have returned:

  The HTTP Content-Type field sent by your web browser did not contain a
  "charset" attribute, but the Content-Type was one of the XML text/*
  sub-types (text/xml). The relevant specification (RFC 3023) specifies a
  strong default of "us-ascii" for such documents so we will use this
  value regardless of any encoding you may have indicated elsewhere. If
  you would like to use a different encoding, you should arrange to have
  your web browser send this new encoding information.

Is that a little clearer?


>>MS IE 6 sends an incorrect, sniffed MIME type of text/xml when
>>uploading the same file presumably because it sees the xml declaration.
>>I have not tried the exhaustive tests (removal of xml declaration,
>>inclusion of the string '<html' or ',HTML' in the first 256 bytes,
>>perhaps inside a comment) etc to try and describe the sniffing
>>algorithm correctly.

And this is precisely why sniffing and guessing should be avoided at all
costs. There is also the issue that as long as browsers accept sloppy
markup etc. there will never be an incentive for authors to clean up their
act. This may be debatable for the case of browsers, but for the Validator,
it's its _job_ to be strict abiout these things! The Validator isn't there
to give people a fuzzy good feeling; it's there to help people make
absolutely sure their documents satisfy the most basic level of quality.


-- 
When I decide that the situation is unacceptable for me, I'll simply fork
the tree.   I do _not_ appreciate being enlisted into anyone's holy wars,
so unless you _really_ want to go _way_ up in my  personal shitlist don't
play politics in my vicinity.                   -- Alexander Viro on lkml

Received on Friday, 25 October 2002 13:50:23 UTC