Re: Percent encoding

On Tue, 02 Mar 2010 18:39:32 +0800, Raphaël Troncy <raphael.troncy@cwi.nl>  
wrote:

> Dear Philip,
>
>> Perhaps YouTube decodes first and splits last, or perhaps they just use
>> a regexp to find v=XXXXX anywhere. Whatever is the case with YouTube, I
>> assume we want to match as closely as possible how query strings works
>> in e.g. ASP, PHP, JSP and Perl CGI, or there is no benefit in using
>> something that resembles query strings.
>>
>> We can never be 100% compatible, for reasons listed in a note after
>> http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/#decode-a-percent-encoded-string
>
> Thanks, the note is indeed really useful. For all the following  
> statements, do you think it is possible to indicate a suitable reference?

Do you mean like a reference in the spec? The results were derived simply  
 from testing the various server environments so for most of them there's  
nothing like another spec or a document to reference, but here's the  
environments/languages that each point is about:

>      *  "&" is the only primary separator for name-value pairs, but some  
> server-side languages also treat ";" as a separator.

CGI Perl also accepts ";" as a separator.

>      * name-value pairs with invalid percent-encoding should be ignored,  
> but some server-side languages silently mask such errors.

Of the tested, only JSP outright rejected invalid input like our spec  
does. Of ASP, PHP and Perl CGI, some removed the invalid part and some  
simply left them intact, but I can't remember exactly which did which.

>      * The "+" character should not be treated specially, but some  
> server-side languages replace it with a space (" ") character.

All of the tested languages do this, it's just how query fragments are  
used. We could duplicate this behavior, but since we actually have syntax  
that requires using "+" (timezones) it would be quite annoying.

>      * Multiple occurrences of the same name must be preserved, but some  
> server-side languages only preserve the last occurrence.

Here again I can't remember the exact behavior of each language. I'm  
pretty sure PHP and CGI Perl only preserve the last occurence, while JSP  
has a list. I'm not sure about ASP and ASP.NET.

If it essential that the exact results are available I could try digging  
out the scripts from the VirtualBox images I set up for testing.

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Tuesday, 2 March 2010 11:42:12 UTC