W3C home > Mailing lists > Public > www-validator@w3.org > March 2006

Re: W3C Markup Validator vs Validome and such

From: David Dorward <david@dorward.me.uk>
Date: Thu, 23 Mar 2006 08:27:12 +0000
To: "tadeusz szewczyk I rebel:art" <rebelart@onreact.com>
Cc: www-validator@w3.org
Message-ID: <20060323082712.GA19831@us-lot.org>

On Wed, Mar 22, 2006 at 08:31:39PM +0100, tadeusz szewczyk I rebel:art wrote:

> While cleaning up my XHTML strict on my website http://onreact.com I
> wanted to be very accurate. So I tested it with several
> validators. I was delighted to find out the W3C Markup Validator did
> not return any errors after I was done. Then I wanted to make sure
> and retested with with others like the one at Validome. It says "The
> Document is not valid XHTML 1.0 Strict" and that I have the
> following error: "Unexpected char in row 55 and column 111; this
> char is not allowed within charset (utf-8) that you use."
> 
> Which one is right?

Both - since what you have is not a validation issue and hits some
somewhat contradictory parts of various specs.

Your webserver fails to send a character encoding in your HTTP
headers. According to the rules for XML documents if you do not have
an XML prolog declaring a different character encoding then you must
use UTF-8. 

Additionally section 5.1 of the XHTML 1.0 spec doesn't allow you to
serve XHTML documents as text/html unless you "follow the guidelines
set forth in Appendix C" - however Appendix C is informative, not
nomative, so it is questionable as to if you have to follow it or not,
and C.9. is coached in language which says "you may not want to" - if
you take the wording to mean that you must follow the advice then such
an XML prolog is effectively forbidden and you are restricted to using
UTF-8 for your XHTML documents served as Appendix C.

However, the HTTP specification states that if you don't specify a
character encoding, and you are using a text/something content type,
then it defaults to ISO-8859-1.

Also, you have: 

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

Which is supposed to be somewhere HTTP servers can look to get extra
HTTP headers ... but none (that I know of) do ... and some clients pay
attention to them.

If you want to stick to XHTML and take the advice in Appendix C, then
you should convert your document to UTF-8 AND modify your server so it
outputs an HTTP Content-Type header that also states you are using UTF-8.

Personally, I'd swich to the better supported and less weird HTML 4.01
- and then modify the server so it claims I was using the character
encoding I was already using.

-- 
David Dorward                                      http://dorward.me.uk
Received on Thursday, 23 March 2006 08:27:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:21 GMT