Re: Backslashes in URLs

Paul Gilmartin wrote:

> http://www.hlasm.com/english/opl_cobm.htm
>
> contains numerous links with backslashes in URLs, such as:
>
>     http://www.hlasm.com/english/opcd\A.htm
>
> (although the author may be in the process of fixing them).

It seems that such problems have been fixed in the document; it no more 
contains any "\" character.

> RFC 1738 says backslashes are unsafe and must be escaped:

The generic definitions in RFC 1738 were replaced by RFC 2396 in 1998.
The current RFC on generic URL syntax is RFC 3986 (from year 2005),
which is an Internet-standard (STD 66).

When considering a reference to an RFC, use http://www.rfc-editor.org to 
check the status of the RFC.

In this particular issue, the generic URL syntax hasn't changed: "\" is 
allowed in a URL as %-encoded only. This is a consequence of rules that 
specify allowed characters and require that other characters be %-encoded. 
The backslash "\" is no more mentioned separately; it just doesn't appear in 
the set of characters that may appear as such.

> w3's HTML validation fails to report this RFC violation.

Validation in the SGML or XML sense does not include any checks on what 
characters may appear in a URL. The URL-valued attributes are declared as 
CDATA, and DTDs cannot express things like URL syntax.

> It should
> because browser treatment of this invalid usage is inconsistent:
> MSIE treats '\' as if it were '/'; Firefox simply passes it on as '\',

That's an important problem indeed.

> and the w3's own link validator encodes it as '%5C'.

The link checker is a useful tool, which will, en passant, mostly detect 
problems of this type - since usually authors have meant "/" when they have 
written "\", so it is good that the link checked treats it as "%5C". It 
would be even better to report "\" as an error. A link checker (rather than 
a markup validator) could and should do such things.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/ 

Received on Saturday, 19 February 2011 13:54:51 UTC