- From: Philip Jägenstedt <philipj@opera.com>
- Date: Tue, 09 Mar 2010 09:18:03 +0100
- To: "Media Fragment" <public-media-fragment@w3.org>
- Message-ID: <op.u9ajz7x0sr6mfa@nog>
Hi WG, Some issues from yesterday: Axiom: We want the syntax and processing of name-value pairs to be unambiguously defined and reasonably close to how query strings are used in popular server-side languages such as PHP, JSP, ASP and Perl CGI. We need to actually define the generic name-value pair syntax, something we haven't done yet. But to start with processing, this is how it currently works: 1. let input (a byte string) be the URI fragment component (or query component for server-side implementations) 2. split it on '&'. 3. for each resulting string: 3.1. split it into name/value on the first occurrence of '=' (value is the empty string if there is no '=') 3.2. for the two resulting strings (both of which can be the empty string) 3.3. apply percent decoding. if that fails (i.e. there exists a '%' NOT followed by two digits) discard the name/value 3.4. apply UTF-8 decoding. if that fails (complicated, refer to the ECMAScript spec) discard the name/value 3.5. append the resulting name/value unicode string pair to the output list. Again, input is an arbitrary byte string and the output is a list of name-value pairs of unicode strings, where each name may appear more than once. An encoding error in one pair does not leak over into any other. Processing doesn't have to work *exactly* like this, but for each change we make we also need to add another bullet point to the list of incompatibilities with how query strings work in PHP and friends. Here's my suggestion for ABNF syntax, adapted from what I wrote last time I edited the spec [1] mediafragment = namevalue *( "&" namevalue ) namevalue = name [ "=" value ] name = fragment - "&" - "=" value = fragment - "&" This covers step 1-3.1 of the processing. It doesn't say anything about percent-encoding or UTF-8. Anyone who knows how to add that a validity constraint, please do so. You will notice that the mediafragment production is very liberal, in fact *any* production of fragment (or query, it is the same) is also a valid production of mediafragment. This is as it should be in my opinion, to make the processing of name-value pairs independent of the other syntax, so that conforming implementations don't break as soon as we introduce new names or values. Note that processing rules in the spec are *not* a clarification of the ABNF or a "hint" to implementors, as they specify exactly what to do in the face of invalid input in a way that the ABNF does not. For example, the following should not be valid (if we can express the constraints in the declarative syntax), but the result of processing it is still well-defined: #t=%&t=1 (invalid percent-encoding) #id=J%E4genstedt&t=1 (not UTF-8) In both cases, the pair with invalid data is simply discarded and the result is the same as if the input was #t=1 If it's possible to achieve the same effect declaratively, that's probably fine by me unless it makes it much less readable. Beyond... I suggest that we define the semantics of media fragments in terms of a name-value list. In other words, instead of trying to define what this means... #%74=%6ept%3A%310&t=4 ... we should define what the resulting list of name-value pair means: [("t", "npt:10"), ("t", "4")] E.g. the timeprefix and timeparam syntaxes should be matched against these strings, *not* any part of the URI. The timesegment syntax is simply removed as it (and all other foosegment productions) violates the layering of media fragments on top of name-value lists. I am attaching overview.html edited to fix the ABNF part of the name-value pairs as per above. I will commit this to CVS for further editing unless there are objections during the day. [1] It was then reverted by Yves with the commit message "first step, changed back to ABNF, added a collected syntax appendix, question about impoting time defs from rfc3339" on 2010-02-24. I assume it was discussed and I was just too busy to notice. -- Philip Jägenstedt Core Developer Opera Software
Attachments
- text/html attachment: overview.html
Received on Tuesday, 9 March 2010 08:18:56 UTC