W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2001

Re: Invalid escapes... [Was: Re: Error in HTML Tidy Beta

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Wed, 17 Oct 2001 14:55:52 +0200
To: "Larry W. Virden" <lvirden@cas.org>
Cc: <html-tidy@w3.org>
Message-ID: <nhvqstcvpo2qilj5nfagloaulp3ga729je@4ax.com>
* Larry W. Virden wrote:
>Does anyone know of a technical document that might discuss the appropriate
>behavior by a program parsing html that indicates appropriate alternatives
>for handling invalid escapes?  For instance, if a program hits the html
>string
><A HREF="http://www.somestory.com/story1.html">hit&run accident</a>
>
>what are the recommended (or perhaps required) behaviors in interpreting
>&run?

"&run " is canonically equivalent to "&run; " in SGML with the HTML 4
SGML declaration, thus it would be treated as entity reference.

>  Some applications seem to leave things alone, some delete the invalid
>escapes, and some replace the escape with an 'error' character...  Are all
>these 'correct' behaviors?

Section B.1 of HTML 4 recommends "If it encounters an undeclared entity,
the entity should be treated as character data.", i.e. rendered as the
string "&run ". In general it is an error in the document and HTML 4
does not define how to deal with general error conditions.
-- 
Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/
Received on Wednesday, 17 October 2001 08:56:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:46 GMT