- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 20 Aug 2002 11:34:46 +0900
- To: maherb@brimworks.com, www-validator@w3.org
At 12:29 02/08/19 -0400, maherb@brimworks.com wrote:
>In the HTML 4.01 specification on this page,
>
>http://www.w3.org/TR/html4/charset.html#h-5.3.1
>
>describes numeric character references which are perfectly legal,
>however when validating with such numeric character reference, I
>recieve an error:
>
> * Line 106, column 5:
>
> —foo bar<br>
> ^
The validator is correct. Please have a look at
http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html
This says:
CHARSET
BASESET "ISO Registration Number 177//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with
implementation level 3//ESC 2/5 2/15 4/6"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
128 32 UNUSED
160 55136 160
55296 2048 UNUSED -- SURROGATES --
57344 1056768 57344
Please note the line
128 32 UNUSED
This says that 32 characters, starting from character number
128, are unused. The next usable character is 160.
This is because the numbers in numeric character refences
are taken from Unicode, and in Unicode, the characters from
128 to 159 are are control characters, which don't belong
into an HTML document.
What you probably wanted was the character EM DASH, represented
with byte 0x97 in windows-1252. For this, please use the
hexadecimal NCR —, or its decimal equivalent.
Regards, Martin.
>Error: reference to non-SGML character
>
>Thanks,
>-Brian
>
>
>--
> Brian Maher CS Major WWU
> BrimWorks.com >> Glory to God <<
Received on Monday, 19 August 2002 23:30:42 UTC