W3C home > Mailing lists > Public > www-validator@w3.org > October 2008

CGI.pm and private_tempfiles (Was: File Upload script dropping uploaded_file)

From: olivier Thereaux <ot@w3.org>
Date: Wed, 8 Oct 2008 14:49:57 -0400
To: www-validator@w3.org
Message-Id: <15D8F4C0-A687-44F2-BDBD-E16ABB8A0A11@w3.org>
Cc: dev list <public-qa-dev@w3.org>

Hi all,

[Bcc to the people who reported the issue in the past weeks]

More on the issue of validator.w3.org "dropping" content when using  
file upload.

I followed a hunch that there may be some issue with how the validator  
handles files when uploaded. I found very little in the code of the  
validator itself, since it follows the API given by perl's CGI module  
fairly closely. A look at the documentation for that library yields:
http://search.cpan.org/dist/CGI.pm/CGI.pm#private_tempfiles

in particular [[
CGI.pm can process uploaded file. Ordinarily it spools the uploaded  
file to a temporary directory, then deletes the file when done. [...]  
On Unix systems, the -private_tempfiles pragma will cause the  
temporary file to be unlinked as soon as it is opened and before any  
data is written into it, reducing, but not eliminating the risk of  
eavesdropping (there is still a potential race condition). To make  
life harder for the attacker, the program chooses tempfile names by  
calculating a 32 bit checksum of the incoming HTTP headers.
]] -- http://search.cpan.org/dist/CGI.pm/CGI.pm#private_tempfiles

My hunch was: what if the absolutely batty amount of traffic our  
validator servers get increased the number of needed temp files to an  
extent that increases the likeliness of a name collision or any other  
race condition that would prevent the process from 1) writing to the  
temp file (because it is already locked by another process) or 2) read  
it back (because it has been removed by another process). That could  
well cause the validator problem we were experiencing: the inability  
for the validator to access the "uploaded file" content. The naming of  
the file based on HTTP headers would also explain why some people got  
hit by the issue, and not others. .... And indeed, on one of the  
validator servers, I found the /tmp directory with a few thousands of  
temp files that should not have been there.

I am still unsure whether this is a serious bug in CGI.pm (either with  
naming of with private_tempfiles not working as it should), or indeed  
if this is the cause of the issue we were seeing. But I removed the  
offending files, and will be monitoring the temp directory for a  
while. Let's see how that goes.


On 30-Sep-08, at 1:38 PM, olivier Thereaux wrote:

>
>
> On 30-Sep-08, at 9:49 AM, Kathryn Van Stone wrote:
>> I'm not Larry, but I had the same problem last night.  The online  
>> validator worked, but not the one requiring a file load.
>> This was true of Firefox, Safari, and python httplib.
>
> When this happens, can you try and connect to:
> http://128.30.52.13/
> and
> http://128.30.52.49/
> and
> http://qa-dev.w3.org/wmvs/HEAD/
> ?
>
> These are the current IP addresses for the validator servers  
> (including the development server at the end), and I would like to  
> know whether only one is misbehaving or if all are. It will help  
> figure out if the issue is with the machine, the code, or...
>
> I've also seen reports from Win and Mac users, and no particular  
> browser seems to be the cause. I don't really believe in the  
> possibility of an ISP issue, but I'm not brushing it off yet.
>
> Thanks all for your patience.
>
> -- 
> olivier
>
>
Received on Wednesday, 8 October 2008 18:50:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:32 GMT