Spec layering: name-value pairs and beyond

Hi WG,

Some issues from yesterday:

Axiom: We want the syntax and processing of name-value pairs to be  
unambiguously defined and reasonably close to how query strings are used  
in popular server-side languages such as PHP, JSP, ASP and Perl CGI.

We need to actually define the generic name-value pair syntax, something  
we haven't done yet. But to start with processing, this is how it  
currently works:

1. let input (a byte string) be the URI fragment component (or query  
component for server-side implementations)
2. split it on '&'.
3. for each resulting string:
3.1. split it into name/value on the first occurrence of '=' (value is the  
empty string if there is no '=')
3.2. for the two resulting strings (both of which can be the empty string)
3.3. apply percent decoding. if that fails (i.e. there exists a '%' NOT  
followed by two digits) discard the name/value
3.4. apply UTF-8 decoding. if that fails (complicated, refer to the  
ECMAScript spec) discard the name/value
3.5. append the resulting name/value unicode string pair to the output  

Again, input is an arbitrary byte string and the output is a list of  
name-value pairs of unicode strings, where each name may appear more than  
once. An encoding error in one pair does not leak over into any other.  
Processing doesn't have to work *exactly* like this, but for each change  
we make we also need to add another bullet point to the list of  
incompatibilities with how query strings work in PHP and friends.

Here's my suggestion for ABNF syntax, adapted from what I wrote last time  
I edited the spec [1]

mediafragment  = namevalue *( "&" namevalue )
namevalue      = name [ "=" value ]
name           = fragment - "&" - "="
value          = fragment - "&"

This covers step 1-3.1 of the processing. It doesn't say anything about  
percent-encoding or UTF-8. Anyone who knows how to add that a validity  
constraint, please do so.

You will notice that the mediafragment production is very liberal, in fact  
*any* production of fragment (or query, it is the same) is also a valid  
production of mediafragment. This is as it should be in my opinion, to  
make the processing of name-value pairs independent of the other syntax,  
so that conforming implementations don't break as soon as we introduce new  
names or values.

Note that processing rules in the spec are *not* a clarification of the  
ABNF or a "hint" to implementors, as they specify exactly what to do in  
the face of invalid input in a way that the ABNF does not. For example,  
the following should not be valid (if we can express the constraints in  
the declarative syntax), but the result of processing it is still  

#t=%&t=1 (invalid percent-encoding)
#id=J%E4genstedt&t=1 (not UTF-8)

In both cases, the pair with invalid data is simply discarded and the  
result is the same as if the input was #t=1

If it's possible to achieve the same effect declaratively, that's probably  
fine by me unless it makes it much less readable.


I suggest that we define the semantics of media fragments in terms of a  
name-value list. In other words, instead of trying to define what this  


... we should define what the resulting list of name-value pair means:

[("t", "npt:10"), ("t", "4")]

E.g. the timeprefix and timeparam syntaxes should be matched against these  
strings, *not* any part of the URI. The timesegment syntax is simply  
removed as it (and all other foosegment productions) violates the layering  
of media fragments on top of name-value lists.

I am attaching overview.html edited to fix the ABNF part of the name-value  
pairs as per above. I will commit this to CVS for further editing unless  
there are objections during the day.

[1] It was then reverted by Yves with the commit message "first step,  
changed back to ABNF, added a collected syntax appendix, question about  
impoting time defs from rfc3339" on 2010-02-24. I assume it was discussed  
and I was just too busy to notice.

Philip Jägenstedt
Core Developer
Opera Software

Received on Tuesday, 9 March 2010 08:18:56 UTC