Invalid escapes... [Was: Re: Error in HTML Tidy Beta from Larry W. Virden on 2001-10-17 (html-tidy@w3.org from October to December 2001)

From: Larry W. Virden <lvirden@cas.org>
Date: Wed, 17 Oct 2001 07:28:20 -0400 (EDT)
To: <html-tidy@w3.org>
Message-Id: <20011017072819.AAB27434@cas.org>

From: Bjoern Hoehrmann <derhoermi@gmx.net>

>  * Live with the recommended UTF-8/URI escaping
>    (see e.g. http://www.w3.org/International/O-URL-and-ident.html)

:

> Tidy is required to escape URIs like it does by various specifications,
> especially HTML 4 and http://www.w3.org/TR/charmod/ I am sorry if this
> causes any trouble (I haven't checked this for mailto:-URIs), but
> non-ASCII characters are invalid in URIs and you shouldn't have used
> them.



Does anyone know of a technical document that might discuss the appropriate
behavior by a program parsing html that indicates appropriate alternatives
for handling invalid escapes?  For instance, if a program hits the html
string
<A HREF="http://www.somestory.com/story1.html">hit&run accident</a>

what are the recommended (or perhaps required) behaviors in interpreting
&run?  Some applications seem to leave things alone, some delete the invalid
escapes, and some replace the escape with an 'error' character...  Are all
these 'correct' behaviors?

-- 
Never apply a Star Trek solution to a Babylon 5 problem.
Larry W. Virden <mailto:lvirden@cas.org> <URL: http://www.purl.org/NET/lvirden/>
Even if explicitly stated to the contrary, nothing in this posting should 
be construed as representing my employer's opinions.
-><-

Received on Wednesday, 17 October 2001 07:28:52 UTC