W3C home > Mailing lists > Public > www-validator@w3.org > February 2005

Re: Validator Issue (Wierd Symbols)

From: Lachlan Hunt <lachlan.hunt@iinet.net.au>
Date: Wed, 09 Feb 2005 16:57:35 +1100
Message-ID: <4209A64F.7050308@iinet.net.au>
To: www-validator@w3.org

Everett, Alex wrote:
> Is there something wrong with the validator or the source code?

It is a character encoding problem.

> Also, sometimes the reported error changes to question marks instead of blocks.

That is because the characters are invalid, and the validator is
indicating the position of those errornous characters by replacing them
with U+FFFD (REPLACEMENT CHARACTER).

> Website:
> https://security.okstate.edu/sso/index.php

The HTTP headers for this site indicate the character encoding as
ISO-8859-1:

Content-Type: text/html; charset=ISO-8859-1

The character being complained about has the code position 146, which is
a control character within the ISO-8859-1 character repertoir.  Althouth
popular user agents interpret it as U+2019 RIGHT SINGLE QUOTATION MARK,
that character does not exist in ISO-8859-1.  It does, however, exist in
Windows-1252 and is one of the differences between the two encodings [1].

Solutions:
Declare the character encoding as Windows-1252, but the use of
proprietary character encodings is not recommended on the WWW.

Replace the characters with numeric character references: &#x2019; or
&#8217; for that quotation mark.  This is the easiest recommended solution.

Convert the documents to UTF-8.  It's not the easiest solution, but it
is the most recommended.  There are several references to help you do
this including my own 3 part guide to unicode [2] or Jukka Korpela's
excellent character related material [3].

[1] http://www.cs.tut.fi/~jkorpela/www/windows-chars.html
[2] http://lachy.id.au/blogs/log/2004/12/guide-to-unicode-part-1
[3] http://www.cs.tut.fi/~jkorpela/chars/index.html

-- 
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/    Rediscover the Web
http://SpreadFirefox.com/   Igniting the Web
Received on Wednesday, 9 February 2005 05:57:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:18 GMT