Re: Software error when validating URL with '+' signs from MiSsInGnO on 2007-03-05 (www-validator@w3.org from March 2007)

From: MiSsInGnO <missingno@ifrance.com>
Date: Mon, 5 Mar 2007 21:30:16 +0100
To: <www-validator@w3.org>
Message-ID: <005101c75f65$387ab550$b800a8c0@Looksup>

As for the '+' being replaced by %20, I think the problem is due to an 
improper (un)escaping of the string.
IIRC, some RFC suggested that spaces in URLs be replaced by a plus sign.
Then, another RFC suggested spaces to be replaced by a percent sequence 
(%20).

I think the validator first tries to unescape the URL using the first format 
(replacing each '+' by a space)
and then escapes it back using the second format (thus replacing the newly 
inserted spaces by the percent sequence %20).

In fact, my question was more like:
why does it trigger a software error dealing with regular expressions?

I believe this is not the intended behaviour.
I filed a bug regarding this issue at 
http://www.w3.org/Bugs/Public/show_bug.cgi?id=4365
It seems somebody is already working on it.

The same bug can be triggered with different URLs.
So far, I could reproduce this using URLs containing "++", "**", "|*" and 
"|+".
The first two display a "Nested quantifiers in regex" error.
The last ones display a "Quantifier follows nothing in regex" error.

Anyway, thank you very much for this tool.
Please keep up your good work!

Sincerely,

François Poirotte.

----- Original Message ----- 
From: "Frank Ellermann" <nobody@xyzzy.claranet.de>
To: <www-validator@w3.org>
Sent: Monday, March 05, 2007 8:17 PM
Subject: Re: Software error when validating URL with '+' signs

>
> MiSsInGnO wrote:
>
>> When trying to validate an URL with '+' signs in it,
>> I get the following error:
>
>>   Software error:
> [...]
>> For help, please send mail to the webmaster ([no address given]
>> <mailto:%5Bno%20address%20given%5D>), giving this error message
>> and the time and date of the error.
>
> Apparently a double fault on the side of validator.w3.org, it has
> issues with the C++ in your URL, and the mailto doesn't help.
>
>> The URL I was trying to validate was:
>> http://missingno.ifrance.com/C++.php
> [...]
>> I was unable to reproduce this bug using a local copy of the
>> validator (v 0.7.2)
>
> I tried to bypass this validator bug by replacing C%2B%2B by C++
> http://validator.w3.org/check?uri=http%3A%2F%2Fmissingno.ifrance.com%2FC++.php
> and got a slightly more convincing error page:
>
> | Sorry! This document can not be checked.
>
> | I got the following unexpected response when trying to retrieve
> | <http://missingno.ifrance.com/C%20%20.php>:
>
> |    404 Not Found
>
> For unknown reason C++ ended up as C%20%20.  Really odd, why does
> the validator try to percent-escape an ordinary "+" in the query,
> and while it does this why doesn't it work as expected, and last
> but not least, why does it munge C++ into C%20%20 ?
>
> Frank
>
>
>
>

Received on Monday, 5 March 2007 20:31:23 UTC