Re: Media Fragments URI parsing: pseudo algorithm code from Philip Jägenstedt on 2010-06-30 (public-media-fragment@w3.org from June 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Wed, 30 Jun 2010 13:52:13 +0200
To: "Jack Jansen" <Jack.Jansen@cwi.nl>, "Yves Lafon" <ylafon@w3.org>
Cc: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>, Raphaël Troncy <raphael.troncy@eurecom.fr>, "Media Fragment" <public-media-fragment@w3.org>
Message-ID: <op.ve329bneatwj1d@philip-pc.linkoping.osa>

On Wed, 30 Jun 2010 10:39:29 +0200, Yves Lafon <ylafon@w3.org> wrote:

> On Tue, 29 Jun 2010, Jack Jansen wrote:
>
>>
>> On 29 jun 2010, at 22:30, Yves Lafon wrote:
>>
>>> The ABNF describe the whole syntax, and then the different parts.  
>>> There is no need for a multi-step parsing scheme requiring to re-read  
>>> multiple time the same bytes.
>>> To me "%74=%6ept%3A%310" is not a media fragment. %-escaped values are  
>>> allowed only where they are allowed (see grammar).

No, the ABNF doesn't define the whole syntax. If I am mistaken, please  
point to the production which in some way includes "&" and "=" to separate  
name-value pairs. That production is segment, but is non-normative. Since  
it is also wrong, the solution is not to make it normative.

>> Interesting...
>>
>> Unlike Yves, I think the sketched example _is_ a media fragment, but  
>> unlike Philip I don't think we need to specify it in our ABNF.
>
> the URI RFC makes it quite clear where percent encoding is allowed and  
> where it is not. For example, h%74%54p://www.example.com/ is _not_  
> htTp://www.example.com/

Of course, but simply knowing where it is allowed isn't enough. I don't  
think this is disputed, but for the record we cannot completely delegate  
the issue of percent encoding to URI, because:

1. URI doesn't define the syntax of name-value pairs delimited by "&" and  
"=", so MF must.

2. If we want to allow & in track names and ids, then percent-decoding  
must happen *after* splitting the name-value pairs. For example, in  
#track=A%20%26%20B&t=10 the track name is "A & B".

If we agree, then the question is where to perform percent-decoding.

Only performing percent-decoding for track and id is certainly possible,  
but something I object to because:

1. It is more complicated than simply always performing percent-decoding.

2. Deployed server software doesn't parse query strings like this, so it  
wouldn't be possible to use those existing tools to build server-side  
Media Fragment parsers.

To 2 one could reply "but it only matters for invalid input", but this  
isn't acceptable. The same things should work (and not work) in all  
implementations. Ignoring what happens for invalid input is a sure recipe  
for incompatibilities.

> Also, do you want 'NpT' to be equivalent to 'nPT' and 'npt' ?

No, because no existing software I tested handles query strings  
case-insensitively, and it makes a parser more complicated, not less.

> To me, if you are escaping something, there is good reason for that, if  
> you do it in 'npt' you probably mean that you don't want it to be  
> processed as 'npt' directly. The grammar allows pct-encoding in track  
> names and ids.

As it stands, there is no normative syntax or processing defined for  
name-value pairs. These two things don't have to be perfectly in sync, so  
it's possible to have the syntax only allow percent-encoding of certain  
values and let the processing handle it everywhere, but I can't see any  
benefit to that. Should this WG decide that both the syntax and processing  
differ significantly from query string syntax, then my processing sections  
must be removed altogether, as they would be incorrect. Of course, I would  
object to defining processing like that, for the reasons given above.

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Wednesday, 30 June 2010 11:52:55 UTC