W3C home > Mailing lists > Public > www-validator@w3.org > December 2007

Re: Validation problem

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Tue, 18 Dec 2007 17:36:18 +0200
Message-ID: <01f501c8418b$c02e2b90$0500000a@DOCENDO>
To: "Magni Hovgaard" <m_hovgaard@hotmail.com>, <www-validator@w3.org>

Magni Hovgaard wrote:

> http://www.clarecoco.ie/services/gaeilge/gaeilge.html
> The online says the page is valid, while my local installation
> returns these errors.
- -
> Line 87, column 18: character "�" is not allowed in the value of
> attribute "name" <h2><a name="Réamhrá" id="Réamhrá"></a>

Looks like an encoding problem, as you suspect. The offending character 
seems to be U+FFFD, REPLACEMENT CHARACTER, which is an indicator for 
character data error. Oddly enough, the markup quoted seems to contain 
it properly, as "á".

> Encoding:
> iso-8859-1

Even more puzzling.

As a workaround, you could represent "á" as "&eacute;" as elsewhere on 
the page. There is no reason why this entity reference could not be used 
in an attribute value, too. But of course you _should_ be able to write 
it as such as well.

On the other hand, non-ASCII characters are risky in ID values. It would 
be safer to omit the diacritic, i.e. use just "Reamhra", since this is 
mostly just an internal code rather than something visible to users. It 
becomes visible as part of URL, if someone uses it in a fragment 
identifier in a link, but this in turn implies problems, since not all 
browsers can handle non-ASCII characters in URLs properly.

(Actually I was somewhat astonished at noticing that XHTML indeed allows 
e.g. "á" in an identifier. I should have remembered that - my book on 
Unicode has a longish discussion of the identifier concept in XML - but 
the concept is fairly confusing and complex and rarely applied in web 
authoring. People just tend to stick to ASCII letters there.)

Jukka K. Korpela ("Yucca")
Received on Tuesday, 18 December 2007 15:36:04 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:54 UTC