W3C home > Mailing lists > Public > www-validator@w3.org > February 2013

UTF-8 "assumed" even though explicitly stated

From: James Haigh <james.r.haigh@gmail.com>
Date: Thu, 21 Feb 2013 08:05:21 +0000
Message-ID: <CA+yoODaLGA_xgzw7qDQrcQM1tkiO107bkuBHwA6mVUmAM77B8Q@mail.gmail.com>
To: www-validator@w3.org

When using direct input mode, I get a warning that UTF-8 is assumed,
despite explicitly stating '<meta charset="UTF-8"/>' in the head of
the document. The warning even appears when 'utf-8 (Unicode,
worldwide)' is selected in the encoding field.

I see no reason why this warning has to appear when UTF-8 is
explicitly stated (since it's not really an assumption in this case);
it should only be shown if the text is not known to be UTF-8. Although
it may not be considered a bug in the validator, it is definitely
short of an improvement.

This is the warning message:

"Using Direct Input mode: UTF-8 character encoding assumed

Unlike the “by URI” and “by File Upload” modes, the “Direct Input”
mode of the validator provides validated content in the form of
characters pasted or typed in the validator's form field. This will
automatically make the data UTF-8, and therefore the validator does
not need to determine the character encoding of your document, and
will ignore any charset information specified.

If you notice a discrepancy in detected character encoding between the
“Direct Input” mode and other validator modes, this is likely to be
the reason. It is neither a bug in the validator, nor in your

Please suppress this warning message when the document is known to be in UTF-8.

Also, if only UTF-8 is supported for direct input, please consider
locking the encoding field on http://validator.w3.org/check (results
page) to 'utf-8 (Unicode, worldwide)' when direct input was used.

James Haigh.
Received on Sunday, 24 February 2013 22:51:07 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:07 UTC