Re: Public beta test of the W3C Markup Validator from Martin Duerst on 2002-10-26 (www-validator@w3.org from October 2002)

From: Martin Duerst <duerst@w3.org>
Date: Sat, 26 Oct 2002 09:39:56 +0900
To: Terje Bless <link@pobox.com>, W3C Validator <www-validator@w3.org>
Cc: Chris Lilley <chris@w3.org>
Message-Id: <4.2.0.58.J.20021026093806.042268e8@localhost>

Hello Terje,

Many thanks for working out the details. Two suggestions:
- Print the actual content type (unless we do that anyway, don't remember
   exactly)
- Say something about override (I guess this is particularly useful
   for uploads).

Regards,    Martin.

At 19:50 02/10/25 +0200, Terje Bless wrote:
>Martin Duerst <duerst@w3.org> wrote:
>
> >The current message is confusing for somebody using file
> >upload. There is probably a better way to explain what happens.
>
>The current code in CVS now reads:
>
>   if ($File->{'Is Upload'}) {
>     my @_source = ('sent by your web browser', 'browser send');
>   } else {
>     my @_source = ('returned by your web server', 'server return');
>   }
>   &add_warning($File, <<".EOF.");
>       <em>Note:</em>
>       The HTTP Content-Type field $_source[0] did not contain
>       a "charset" attribute, but the Content-Type was one of the XML text/*
>       sub-types (<code>$File->{ContentType}</code>). The relevant
>       specification (RFC 3023) specifies a strong default of "us-ascii" for
>       such documents so we will use this value regardless of any encoding
>       you may have indicated elsewhere. If you would like to use a
>       different encoding, you should arrange to have your $_source[1] this
>       new encoding information.
>.EOF.
>
>IOW, in the case you reported it would have returned:
>
>   The HTTP Content-Type field sent by your web browser did not contain a
>   "charset" attribute, but the Content-Type was one of the XML text/*
>   sub-types (text/xml). The relevant specification (RFC 3023) specifies a
>   strong default of "us-ascii" for such documents so we will use this
>   value regardless of any encoding you may have indicated elsewhere. If
>   you would like to use a different encoding, you should arrange to have
>   your web browser send this new encoding information.
>
>Is that a little clearer?
>
>
> >>MS IE 6 sends an incorrect, sniffed MIME type of text/xml when
> >>uploading the same file presumably because it sees the xml declaration.
> >>I have not tried the exhaustive tests (removal of xml declaration,
> >>inclusion of the string '<html' or ',HTML' in the first 256 bytes,
> >>perhaps inside a comment) etc to try and describe the sniffing
> >>algorithm correctly.
>
>And this is precisely why sniffing and guessing should be avoided at all
>costs. There is also the issue that as long as browsers accept sloppy
>markup etc. there will never be an incentive for authors to clean up their
>act. This may be debatable for the case of browsers, but for the Validator,
>it's its _job_ to be strict abiout these things! The Validator isn't there
>to give people a fuzzy good feeling; it's there to help people make
>absolutely sure their documents satisfy the most basic level of quality.
>
>
>--
>When I decide that the situation is unacceptable for me, I'll simply fork
>the tree.   I do _not_ appreciate being enlisted into anyone's holy wars,
>so unless you _really_ want to go _way_ up in my  personal shitlist don't
>play politics in my vicinity.                   -- Alexander Viro on lkml

Received on Friday, 25 October 2002 20:45:09 UTC