W3C home > Mailing lists > Public > www-validator@w3.org > June 2001

Re: Non-SGML Char Refs

From: Martin Duerst <duerst@w3.org>
Date: Tue, 05 Jun 2001 17:02:27 +0900
Message-Id: <4.2.0.58.J.20010605165925.02c29ac0@sh.w3.mag.keio.ac.jp>
To: Bjoern Hoehrmann <derhoermi@gmx.net>, tkinias@optimalco.com
Cc: "'www-validator@w3.org'" <www-validator@w3.org>
At 04:32 01/06/05 +0200, Bjoern Hoehrmann wrote:
>* Thanasis Kinias wrote:
> >The validator complains about "non-SGML character" references (e.g., &#147;
> >instead of the correct &#8220;) only when validating as XHTML.  That implies
> >that &#147; and the other Microsoft characters from decimal 128-159 (hex
> >80-9f) _are_ valid in HTML.
>
>They are, they just refer to non-printing control characters.

No, sorry, they are not. See
http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html

  CHARSET
           BASESET  "ISO Registration Number 177//CHARSET
                     ISO/IEC 10646-1:1993 UCS-4 with
                     implementation level 3//ESC 2/5 2/15 4/6"
          DESCSET 0       9       UNUSED
                  9       2       9
                  11      2       UNUSED
                  13      1       13
                  14      18      UNUSED
                  32      95      32
                  127     1       UNUSED
                  128     32      UNUSED
                  160     55136   160
                  55296   2048    UNUSED  -- SURROGATES --
                  57344   1056768 57344

The line "128     32      UNUSED" excludes them, or doesn't it?

Actually, these code positions are valid (though rather useless)
in XML, but they are invalid in HTML. So I'm not sure what the
result is for XHTML.

Regards,   Martin.
Received on Tuesday, 5 June 2001 04:03:05 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:13:58 GMT