W3C home > Mailing lists > Public > www-validator@w3.org > August 2004

Re: Validation error

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Tue, 10 Aug 2004 10:49:28 +0300 (EEST)
To: Julien ÉLIE <julien.elie@wanadoo.fr>
cc: www-validator@w3.org
Message-ID: <Pine.GSO.4.58.0408101036510.19561@korppi.cs.tut.fi>

On Tue, 10 Aug 2004, [ISO-8859-15] Julien ÉLIE wrote:

> I try to validate the page which I enclose.

It is generally best to post the URL of a page; that way others can see
the entire document as well as the HTTP headers, which are relevant
especially when character encoding problems are involved.

> §
> Line 10, column 642: non SGML character number 128
>
> ....sente l'avantage de prévoir la ligature « &oelig; » et le symbole de l'e
> §
>
> with the « u » of « ligature » underlined.
> However, this is not here, but after : the euro symbol ?.

The validator seems to highlight things sometimes wrongly, so that the
error appears to be in a location quite different from the real one.
If you modify the HTML source by adding some libe breaks, you will
probably see how the highlighted location or segment is changed.

The validator http://www.htmlhelp.com/validator/ seems to indicate the
problem location better.

> By the way, my encoding is ISO-8859-15 so the symbol « ? » should be valid
> in such an encoding.

Apparentely your source contains the octet 128 decimal (80 hexadecimal).
In ISO-8859-15, it is _undefined_, i.e. a data stream with
that octet, declared to be ISO-8859-15, is malformed. The validator's
message is misleading, since in this context octet 128 decimal means
_nothing_ so it cannot be identified with the ISO 10646 code position 128
decimal (which is not permitted in HTML). But this is really splitting
hairs; in any case the document is in error. If browsers render octet 128
as decimal, they do so as error recovery: the octet means nothing in the
declared encoding, so a browser might do something based on a guess what
might have been the author's intention. In practice, browsers just
interpret the undefined octets as if the encoding were declared as
Windows Latin 1 (windows-1252).

The correct representation of the euro sign in ISO-8859-15 is
octet 164 decimal (A4 hexadecimal) - the same octet that means the
CURRENCY SIGN (¤) in ISO-8859-1.

In practical terms, ISO-8859-15 is useless on the Web. Most browsers don't
support it, and anything you can do in ISO-8859-15 can be done easily
using ISO-8859-1 and a few character references (and, if desired, the
entity reference &euro;).

> There is no warning if I put « &euro; » instead.

That's right, because &euro; is defined independently of any character
encoding issues.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Tuesday, 10 August 2004 10:21:57 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:14:08 UTC