- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Tue, 10 Aug 2004 10:49:28 +0300 (EEST)
- To: Julien ÉLIE <julien.elie@wanadoo.fr>
- cc: www-validator@w3.org
On Tue, 10 Aug 2004, [ISO-8859-15] Julien ÉLIE wrote: > I try to validate the page which I enclose. It is generally best to post the URL of a page; that way others can see the entire document as well as the HTTP headers, which are relevant especially when character encoding problems are involved. > § > Line 10, column 642: non SGML character number 128 > > ....sente l'avantage de prévoir la ligature « œ » et le symbole de l'e > § > > with the « u » of « ligature » underlined. > However, this is not here, but after : the euro symbol ?. The validator seems to highlight things sometimes wrongly, so that the error appears to be in a location quite different from the real one. If you modify the HTML source by adding some libe breaks, you will probably see how the highlighted location or segment is changed. The validator http://www.htmlhelp.com/validator/ seems to indicate the problem location better. > By the way, my encoding is ISO-8859-15 so the symbol « ? » should be valid > in such an encoding. Apparentely your source contains the octet 128 decimal (80 hexadecimal). In ISO-8859-15, it is _undefined_, i.e. a data stream with that octet, declared to be ISO-8859-15, is malformed. The validator's message is misleading, since in this context octet 128 decimal means _nothing_ so it cannot be identified with the ISO 10646 code position 128 decimal (which is not permitted in HTML). But this is really splitting hairs; in any case the document is in error. If browsers render octet 128 as decimal, they do so as error recovery: the octet means nothing in the declared encoding, so a browser might do something based on a guess what might have been the author's intention. In practice, browsers just interpret the undefined octets as if the encoding were declared as Windows Latin 1 (windows-1252). The correct representation of the euro sign in ISO-8859-15 is octet 164 decimal (A4 hexadecimal) - the same octet that means the CURRENCY SIGN (¤) in ISO-8859-1. In practical terms, ISO-8859-15 is useless on the Web. Most browsers don't support it, and anything you can do in ISO-8859-15 can be done easily using ISO-8859-1 and a few character references (and, if desired, the entity reference €). > There is no warning if I put « € » instead. That's right, because € is defined independently of any character encoding issues. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Tuesday, 10 August 2004 10:21:57 UTC