Re: Percent encoding from Philip Jägenstedt on 2010-03-01 (public-media-fragment@w3.org from March 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Mon, 01 Mar 2010 10:20:46 +0800
To: "Jack Jansen" <Jack.Jansen@cwi.nl>
Cc: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>, "Media Fragment" <public-media-fragment@w3.org>
Message-ID: <op.u8t1lgxcatwj1d@philip-pc>

On Sun, 28 Feb 2010 05:52:06 +0800, Jack Jansen <Jack.Jansen@cwi.nl> wrote:

> I think we need to prioritize our needs, and then based on that  
> prioritization decide whether to include quotes for id/track, when to do  
> percent decoding, etc.
>
> Here's a list of issues that I can come up with (unprioritized):
>
> a. The MF syntax for queries and fragments should be identical
> b. The MF syntax should be unambiguous
> c.  The MF syntax should allow any UTF-8 character in track or id names
> d. The MF syntax should adhere to applicable formal standards
> e. The MF syntax should adhere to de-facto usage of queries and fragments
> f. The MF syntax should be as concise as possible, with no unneeded  
> grammatical fluff
>
> Are there any issues I miss?
>
> I think my current prioritizing would have b/c/d highest priority, then  
> a, then e, then f.

I'm not sure what priority order I would make (maybe b-a-d-c-e-f), but  
think we only need to discuss it if we actually disagree on some concrete  
issue.

> But: this still leaves the question "what is de-facto usage". I typed in  
> the youtube URL on a whim last week, but tonight I've tried a couple of  
> other sites, and so far it seems that YouTube is the only major site I  
> have come across that seems to do percent-decoding very early in the  
> process. And even here it is very weird: using %26 as the argument  
> separator *only* works if you also specify %3f as the query separator.  
> If you use '?' then the %26 becomes an ampersand inside the search  
> string.

What is currently in the spec is reverse-engineered from how PHP, ASP, JSP  
and Perl CGI behave, with the differences noted in that section (handling  
of invalid percent encoding, etc). I did this simply by writing a script  
in each language that outputs the name-value pairs it gets and tried with  
different input. The assumption here is of course that these languages  
combined represent a large majority of web servers and thus define the  
de-facto usage.

There are no formal standards here unfortunately, or we wouldn't have had  
to define name-value parsing ourselves to begin with. I guess the only  
limitation is what characters are allowed in the fragment component, but  
we already have that covered in the BNF I think.

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Monday, 1 March 2010 02:21:28 UTC