Re: Media Fragments URI parsing: pseudo algorithm code from Yves Lafon on 2010-06-30 (public-media-fragment@w3.org from June 2010)

From: Yves Lafon <ylafon@w3.org>
Date: Wed, 30 Jun 2010 08:13:57 -0400 (EDT)
To: Philip Jägenstedt <philipj@opera.com>
cc: Jack Jansen <Jack.Jansen@cwi.nl>, Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Raphaël Troncy <raphael.troncy@eurecom.fr>, Media Fragment <public-media-fragment@w3.org>
Message-ID: <alpine.DEB.1.10.1006300806010.27524@wnl.j3.bet>

On Wed, 30 Jun 2010, Philip Jägenstedt wrote:

> On Wed, 30 Jun 2010 10:39:29 +0200, Yves Lafon <ylafon@w3.org> wrote:
>
>> On Tue, 29 Jun 2010, Jack Jansen wrote:
>> 
>>> 
>>> On 29 jun 2010, at 22:30, Yves Lafon wrote:
>>> 
>>>> The ABNF describe the whole syntax, and then the different parts. There 
>>>> is no need for a multi-step parsing scheme requiring to re-read multiple 
>>>> time the same bytes.
>>>> To me "%74=%6ept%3A%310" is not a media fragment. %-escaped values are 
>>>> allowed only where they are allowed (see grammar).
>
> No, the ABNF doesn't define the whole syntax. If I am mistaken, please point 
> to the production which in some way includes "&" and "=" to separate 
> name-value pairs. That production is segment, but is non-normative. Since it 
> is also wrong, the solution is not to make it normative.

mediasegment     = namesegment / axissegment
axissegment      = ( timesegment / spacesegment / tracksegment )
                *( "&" ( timesegment / spacesegment / tracksegment )
timesegment      = timeprefix "=" timeparam
...
It should be normative, if it's not it is a mistake.


>>> Interesting...
>>> 
>>> Unlike Yves, I think the sketched example _is_ a media fragment, but 
>>> unlike Philip I don't think we need to specify it in our ABNF.
>> 
>> the URI RFC makes it quite clear where percent encoding is allowed and 
>> where it is not. For example, h%74%54p://www.example.com/ is _not_ 
>> htTp://www.example.com/
>
> Of course, but simply knowing where it is allowed isn't enough. I don't think 
> this is disputed, but for the record we cannot completely delegate the issue 
> of percent encoding to URI, because:
>
> 1. URI doesn't define the syntax of name-value pairs delimited by "&" and 
> "=", so MF must.

http://www.ietf.org/rfc/rfc3986.txt section 2.4
So you parse your uri in components (that are identified by our grammar), 
then you percent-decode what is needed.
With the current grammar, it is allowed only in track and id productions.
So it is perfectly compatible with the processing defined in rfc3986 and 
perfectly allows #track=A%20%26%20B&t=10

> 2. If we want to allow & in track names and ids, then percent-decoding must 
> happen *after* splitting the name-value pairs. For example, in 
> #track=A%20%26%20B&t=10 the track name is "A & B".
>
> If we agree, then the question is where to perform percent-decoding.
>
> Only performing percent-decoding for track and id is certainly possible, but 
> something I object to because:
>
> 1. It is more complicated than simply always performing percent-decoding..

Not if you have a parser based on the grammar, but it is not mandatory to 
build an efficient parser, see below.

> 2. Deployed server software doesn't parse query strings like this, so it 
> wouldn't be possible to use those existing tools to build server-side Media 
> Fragment parsers.

If you (or we) define a parsing algorithm that matches what is in the 
grammar, I am all for it (in fact we put the algorithm-based definition in 
the appendix for that reason), implementers will use different trade-offs 
in parsing and that's perfectly ok

> To 2 one could reply "but it only matters for invalid input", but this isn't 
> acceptable. The same things should work (and not work) in all 
> implementations. Ignoring what happens for invalid input is a sure recipe for 
> incompatibilities.
>
>> Also, do you want 'NpT' to be equivalent to 'nPT' and 'npt' ?
>
> No, because no existing software I tested handles query strings 
> case-insensitively, and it makes a parser more complicated, not less.
>
>> To me, if you are escaping something, there is good reason for that, if you 
>> do it in 'npt' you probably mean that you don't want it to be processed as 
>> 'npt' directly. The grammar allows pct-encoding in track names and ids.
>
> As it stands, there is no normative syntax or processing defined for 
> name-value pairs. These two things don't have to be perfectly in sync, so 
> it's possible to have the syntax only allow percent-encoding of certain 
> values and let the processing handle it everywhere, but I can't see any 
> benefit to that. Should this WG decide that both the syntax and processing 
> differ significantly from query string syntax, then my processing sections 
> must be removed altogether, as they would be incorrect. Of course, I would 
> object to defining processing like that, for the reasons given above.
>
>

-- 
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves

Received on Wednesday, 30 June 2010 12:14:08 UTC