Re: Non-SGML Char Refs

From: Martin Duerst (duerst@w3.org)
Date: Sun, Jul 15 2001

  • Next message: Terje Bless: "Re: XHTML1.1?"

    Message-Id: <4.2.0.58.J.20010716094402.03ef0aa0@sh.w3.mag.keio.ac.jp>
    Date: Mon, 16 Jul 2001 09:45:37 +0900
    To: Bjoern Hoehrmann <derhoermi@gmx.net>
    From: Martin Duerst <duerst@w3.org>
    Cc: tkinias@optimalco.com, "'www-validator@w3.org'" <www-validator@w3.org>, www-html@w3.org
    Subject: Re: Non-SGML Char Refs
    
    The state as discribed by Bjoern is what I understand, too.
    There is a proposal to use the HTML restrictions even for
    XHTML, because using something like &#128; and thinking that
    it stands for the Euro is a very frequent error.
    
    Regards,   Martin.
    
    At 01:07 01/07/16 +0200, Bjoern Hoehrmann wrote:
    >* Martin Duerst wrote:
    > >At 04:32 01/06/05 +0200, Bjoern Hoehrmann wrote:
    > >>* Thanasis Kinias wrote:
    > >> >The validator complains about "non-SGML character" references (e.g., 
    > &#147;
    > >> >instead of the correct &#8220;) only when validating as XHTML.  That 
    > implies
    > >> >that &#147; and the other Microsoft characters from decimal 128-159 (hex
    > >> >80-9f) _are_ valid in HTML.
    > >>
    > >>They are, they just refer to non-printing control characters.
    >
    >The other way round, valid XML, invalid HTML.
    >
    > >  CHARSET
    > >           BASESET  "ISO Registration Number 177//CHARSET
    > >                     ISO/IEC 10646-1:1993 UCS-4 with
    > >                     implementation level 3//ESC 2/5 2/15 4/6"
    > >          DESCSET 0       9       UNUSED
    > >                  9       2       9
    > >                  11      2       UNUSED
    > >                  13      1       13
    > >                  14      18      UNUSED
    > >                  32      95      32
    > >                  127     1       UNUSED
    > >                  128     32      UNUSED
    > >                  160     55136   160
    > >                  55296   2048    UNUSED  -- SURROGATES --
    > >                  57344   1056768 57344
    >
    > >Actually, these code positions are valid (though rather useless)
    > >in XML, but they are invalid in HTML. So I'm not sure what the
    > >result is for XHTML.
    >
    >x'posted to www-html@w3.org. Are the as unused declared characters from
    >HTML 4.0 valid in XHTML 1.0?
    >--
    >Bj$B‹S(Bn H$B‹I(Brmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
    >am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
    >25899 Dageb$B—M(Bl { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/