- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Wed, 14 Feb 2007 21:04:07 +0200 (EET)
- To: www-validator@w3.org
On Wed, 14 Feb 2007, richard Eskins wrote: > Validation of apostrophe. > > Re: http://www.ico-stu.mmu.ac.uk/stu000/icoD-05154850/05154850.html > > This page fails to validate because of the apostrophes used (as expected). It fails because it contains octets 145 and 146 (decimal), which denote control codes according to the ISO-8859-1 encoding, and these control codes (in the C1 Controls set) are not allowed in XML. The error messages about "non-SGML characters" are thus a bit misleading and reflect the nature of the validator as an SGML validator with some XML features hacked into it. I suppose you know how the problem can be fixed (in different ways, like declaring the encoding as windows-1252 or by using entities). > However, if the page is validated using the Direct > Input facility it does validate. > > Is this an error in the way the Direct Input facility works? I'd say so. The problem is that the Direct Input facility is on a UTF-8 encoded page, and when you e.g. cut and past your document there, your browser converts the representation of characters into UTF-8 and effectively treats the original data as windows-1252 encoded. Therefore 145 and 146 become UTF-8 encoded forms of left and right single quotation marks, which are the intended characters, but the most important point is that they become acceptable character data. Then the validator somehow fails to pay attention to the ISO-8859-1 encoding declared in the <meta> tag. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Wednesday, 14 February 2007 19:04:15 UTC