W3C home > Mailing lists > Public > www-validator@w3.org > February 2007

Re: Validation of apostrophe

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Wed, 14 Feb 2007 21:04:07 +0200 (EET)
To: www-validator@w3.org
Message-ID: <Pine.GSO.4.64.0702142054270.19238@mustatilhi.cs.tut.fi>

On Wed, 14 Feb 2007, richard Eskins wrote:

> Validation of apostrophe.
> Re: http://www.ico-stu.mmu.ac.uk/stu000/icoD-05154850/05154850.html
> This page fails to validate because of the apostrophes used (as expected).

It fails because it contains octets 145 and 146 (decimal), which denote 
control codes according to the ISO-8859-1 encoding, and these control 
codes (in the C1 Controls set) are not allowed in XML. The error messages 
about "non-SGML characters" are thus a bit misleading and reflect the 
nature of the validator as an SGML validator with some XML features hacked 
into it.

I suppose you know how the problem can be fixed (in different ways, like 
declaring the encoding as windows-1252 or by using entities).

> However, if the page is validated using the Direct
> Input facility it does validate.
> Is this an error in the way the Direct Input facility works?

I'd say so. The problem is that the Direct Input facility is on a UTF-8 
encoded page, and when you e.g. cut and past your document there, your 
browser converts the representation of characters into UTF-8 and 
effectively treats the original data as windows-1252 encoded. Therefore 
145 and 146 become UTF-8 encoded forms of left and right single quotation 
marks, which are the intended characters, but the most important point is 
that they become acceptable character data. Then the validator somehow 
fails to pay attention to the ISO-8859-1 encoding declared in the <meta> 

Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Wednesday, 14 February 2007 19:04:15 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:59:00 UTC