W3C home > Mailing lists > Public > www-validator@w3.org > July 2012

Re: Outdated link for character entity list in validator error message (non SGML character number)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sun, 29 Jul 2012 22:52:28 +0300
Message-ID: <5015947C.2070103@cs.tut.fi>
To: Josh Hillman <hillman@joshhillman.com>
CC: www-validator@w3.org
2012-07-25 19:56, Josh Hillman wrote:

> When using http://validator.w3.org to validate HTML 4.01 Strict (and
> possibly others) by using direct input, the "non SGML character number" (133
> in this particular case) is reported appropriately when a character outside
> the accepted range is encountered, however the "character entity" link
> referenced in the error description appears to be outdated.  The "character
> entity" link references character entity documentation for HTML 3:
>    http://www.w3.org/MarkUp/html3/latin1.html

You are quite right. The link was wrong from the beginning, since the 
HTML 3 draft should never have been cited except as work in progress, 
and it expired in 1995.

> Shouldn't the link reference character entity documentation for HTML
> 4(.01)?:

That would be better. But the entire error description is outdated and 
really wrong from the beginning. A reference to entities is really 
irrelevant when the issue is plain character data.

> You have used an illegal character in your text. HTML uses the standard
> UNICODE Consortium character repertoire, and it leaves undefined (among
> others) 65 character codes (0 to 31 inclusive and 127 to 159 inclusive) that
> are sometimes used for typographical quote marks and similar in proprietary
> character sets.

Thatís not correct at all. Unicode defines those code position as 
allocated to control characters, not undefined. They are disallowed in 
HTML, but thatís a different issue.

> Your best bet is to replace the character with the nearest equivalent ASCII
> character,

That was hardly good advice in the last ten years or so.

> or to use an appropriate character entity.

ďCharacter entityĒ is a misnomer.

> For more information
> on Character Encoding on the web, see Alan Flavell's excellent HTML
> Character Set Issues reference.

It was truly excellent in the old days, but Iím sure Alan would prefer 
references to newer resources. Besides, thereís now flavell.org that 
hosts Alanís material, so pointing to archive.org is outdated.

> This error can also be triggered by formatting characters embedded in
> documents by some word processors. If you use a word processor to edit your
> HTML documents, be sure to use the "Save as ASCII" or similar command to
> save the document without formatting information.

In 2012, such advice more likely causes confusion than helps anyone.

Received on Sunday, 29 July 2012 19:53:09 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:06 UTC