- From: MiSsInGnO <missingno@ifrance.com>
- Date: Mon, 5 Mar 2007 21:30:16 +0100
- To: <www-validator@w3.org>
As for the '+' being replaced by %20, I think the problem is due to an improper (un)escaping of the string. IIRC, some RFC suggested that spaces in URLs be replaced by a plus sign. Then, another RFC suggested spaces to be replaced by a percent sequence (%20). I think the validator first tries to unescape the URL using the first format (replacing each '+' by a space) and then escapes it back using the second format (thus replacing the newly inserted spaces by the percent sequence %20). In fact, my question was more like: why does it trigger a software error dealing with regular expressions? I believe this is not the intended behaviour. I filed a bug regarding this issue at http://www.w3.org/Bugs/Public/show_bug.cgi?id=4365 It seems somebody is already working on it. The same bug can be triggered with different URLs. So far, I could reproduce this using URLs containing "++", "**", "|*" and "|+". The first two display a "Nested quantifiers in regex" error. The last ones display a "Quantifier follows nothing in regex" error. Anyway, thank you very much for this tool. Please keep up your good work! Sincerely, François Poirotte. ----- Original Message ----- From: "Frank Ellermann" <nobody@xyzzy.claranet.de> To: <www-validator@w3.org> Sent: Monday, March 05, 2007 8:17 PM Subject: Re: Software error when validating URL with '+' signs > > MiSsInGnO wrote: > >> When trying to validate an URL with '+' signs in it, >> I get the following error: > >> Software error: > [...] >> For help, please send mail to the webmaster ([no address given] >> <mailto:%5Bno%20address%20given%5D>), giving this error message >> and the time and date of the error. > > Apparently a double fault on the side of validator.w3.org, it has > issues with the C++ in your URL, and the mailto doesn't help. > >> The URL I was trying to validate was: >> http://missingno.ifrance.com/C++.php > [...] >> I was unable to reproduce this bug using a local copy of the >> validator (v 0.7.2) > > I tried to bypass this validator bug by replacing C%2B%2B by C++ > http://validator.w3.org/check?uri=http%3A%2F%2Fmissingno.ifrance.com%2FC++.php > and got a slightly more convincing error page: > > | Sorry! This document can not be checked. > > | I got the following unexpected response when trying to retrieve > | <http://missingno.ifrance.com/C%20%20.php>: > > | 404 Not Found > > For unknown reason C++ ended up as C%20%20. Really odd, why does > the validator try to percent-escape an ordinary "+" in the query, > and while it does this why doesn't it work as expected, and last > but not least, why does it munge C++ into C%20%20 ? > > Frank > > > >
Received on Monday, 5 March 2007 20:31:23 UTC