W3C home > Mailing lists > Public > www-validator@w3.org > June 2001

Re: Non-SGML Char Refs

From: Thanasis Kinias <tkinias@optimalco.com>
Date: Fri, 08 Jun 2001 09:21:58 -0700
To: Martin Duerst <duerst@w3.org>, Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: "'www-validator@w3.org'" <www-validator@w3.org>
Message-id: <01060809215801.05845@localhost.localdomain>
On Tuesday 05 June 2001 01:02, Martin Duerst wrote:
> At 04:32 01/06/05 +0200, Bjoern Hoehrmann wrote:
> >* Thanasis Kinias wrote:
> > >The validator complains about "non-SGML character" references (e.g.,
> > > &#147; instead of the correct &#8220;) only when validating as XHTML. 
> > > That implies that &#147; and the other Microsoft characters from
> > > decimal 128-159 (hex 80-9f) _are_ valid in HTML.
> >
> >They are, they just refer to non-printing control characters.
>
> No, sorry, they are not. See
> http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html
>
>  CHARSET
>           BASESET  "ISO Registration Number 177//CHARSET
[...]

Funny, I quoted this exactly in my original post.  Great minds must think 
alike, eh Martin?

> Actually, these code positions are valid (though rather useless)
> in XML, but they are invalid in HTML. So I'm not sure what the
> result is for XHTML.

The intent of my original post (which was admittedly not entirely clear) was 
to find out why the validator shows exactly the opposite of this:  it accepts 
the characters in HTML4 but complains in XHTML.  (WDG's, BTW, complains about 
them under HTML4 DTDs, too.).

I don't think these can be valid code positions in XML, because an XML doc is 
also a SGML doc, so if SGML disallows them XML must also, no?

At any rate, the validator is producing erroneous output for HTML4, and maybe 
for XHTML as well.

Regards,
-- 
Thanasis Kinias
Vice President & Manager of Information Systems
Optimal LLC
Scottsdale, Arizona, USA
Received on Friday, 8 June 2001 12:22:10 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:13:58 GMT