W3C home > Mailing lists > Public > www-validator@w3.org > February 2011

Re: Backslashes in URLs

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sat, 19 Feb 2011 15:54:05 +0200
Message-ID: <B6771158B182404B968EFAED43602183@JukanPC>
To: "Paul Gilmartin" <PaulGBoulder@aim.com>, <www-validator@w3.org>
Paul Gilmartin wrote:

> http://www.hlasm.com/english/opl_cobm.htm
>
> contains numerous links with backslashes in URLs, such as:
>
>     http://www.hlasm.com/english/opcd\A.htm
>
> (although the author may be in the process of fixing them).

It seems that such problems have been fixed in the document; it no more 
contains any "\" character.

> RFC 1738 says backslashes are unsafe and must be escaped:

The generic definitions in RFC 1738 were replaced by RFC 2396 in 1998.
The current RFC on generic URL syntax is RFC 3986 (from year 2005),
which is an Internet-standard (STD 66).

When considering a reference to an RFC, use http://www.rfc-editor.org to 
check the status of the RFC.

In this particular issue, the generic URL syntax hasn't changed: "\" is 
allowed in a URL as %-encoded only. This is a consequence of rules that 
specify allowed characters and require that other characters be %-encoded. 
The backslash "\" is no more mentioned separately; it just doesn't appear in 
the set of characters that may appear as such.

> w3's HTML validation fails to report this RFC violation.

Validation in the SGML or XML sense does not include any checks on what 
characters may appear in a URL. The URL-valued attributes are declared as 
CDATA, and DTDs cannot express things like URL syntax.

> It should
> because browser treatment of this invalid usage is inconsistent:
> MSIE treats '\' as if it were '/'; Firefox simply passes it on as '\',

That's an important problem indeed.

> and the w3's own link validator encodes it as '%5C'.

The link checker is a useful tool, which will, en passant, mostly detect 
problems of this type - since usually authors have meant "/" when they have 
written "\", so it is good that the link checked treats it as "%5C". It 
would be even better to report "\" as an error. A link checker (rather than 
a markup validator) could and should do such things.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/ 
Received on Saturday, 19 February 2011 13:54:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:45 GMT