- From: Philip Jägenstedt <philipj@opera.com>
- Date: Tue, 09 Mar 2010 09:18:03 +0100
- To: "Media Fragment" <public-media-fragment@w3.org>
- Message-ID: <op.u9ajz7x0sr6mfa@nog>
Hi WG,
Some issues from yesterday:
Axiom: We want the syntax and processing of name-value pairs to be
unambiguously defined and reasonably close to how query strings are used
in popular server-side languages such as PHP, JSP, ASP and Perl CGI.
We need to actually define the generic name-value pair syntax, something
we haven't done yet. But to start with processing, this is how it
currently works:
1. let input (a byte string) be the URI fragment component (or query
component for server-side implementations)
2. split it on '&'.
3. for each resulting string:
3.1. split it into name/value on the first occurrence of '=' (value is the
empty string if there is no '=')
3.2. for the two resulting strings (both of which can be the empty string)
3.3. apply percent decoding. if that fails (i.e. there exists a '%' NOT
followed by two digits) discard the name/value
3.4. apply UTF-8 decoding. if that fails (complicated, refer to the
ECMAScript spec) discard the name/value
3.5. append the resulting name/value unicode string pair to the output
list.
Again, input is an arbitrary byte string and the output is a list of
name-value pairs of unicode strings, where each name may appear more than
once. An encoding error in one pair does not leak over into any other.
Processing doesn't have to work *exactly* like this, but for each change
we make we also need to add another bullet point to the list of
incompatibilities with how query strings work in PHP and friends.
Here's my suggestion for ABNF syntax, adapted from what I wrote last time
I edited the spec [1]
mediafragment = namevalue *( "&" namevalue )
namevalue = name [ "=" value ]
name = fragment - "&" - "="
value = fragment - "&"
This covers step 1-3.1 of the processing. It doesn't say anything about
percent-encoding or UTF-8. Anyone who knows how to add that a validity
constraint, please do so.
You will notice that the mediafragment production is very liberal, in fact
*any* production of fragment (or query, it is the same) is also a valid
production of mediafragment. This is as it should be in my opinion, to
make the processing of name-value pairs independent of the other syntax,
so that conforming implementations don't break as soon as we introduce new
names or values.
Note that processing rules in the spec are *not* a clarification of the
ABNF or a "hint" to implementors, as they specify exactly what to do in
the face of invalid input in a way that the ABNF does not. For example,
the following should not be valid (if we can express the constraints in
the declarative syntax), but the result of processing it is still
well-defined:
#t=%&t=1 (invalid percent-encoding)
#id=J%E4genstedt&t=1 (not UTF-8)
In both cases, the pair with invalid data is simply discarded and the
result is the same as if the input was #t=1
If it's possible to achieve the same effect declaratively, that's probably
fine by me unless it makes it much less readable.
Beyond...
I suggest that we define the semantics of media fragments in terms of a
name-value list. In other words, instead of trying to define what this
means...
#%74=%6ept%3A%310&t=4
... we should define what the resulting list of name-value pair means:
[("t", "npt:10"), ("t", "4")]
E.g. the timeprefix and timeparam syntaxes should be matched against these
strings, *not* any part of the URI. The timesegment syntax is simply
removed as it (and all other foosegment productions) violates the layering
of media fragments on top of name-value lists.
I am attaching overview.html edited to fix the ABNF part of the name-value
pairs as per above. I will commit this to CVS for further editing unless
there are objections during the day.
[1] It was then reverted by Yves with the commit message "first step,
changed back to ABNF, added a collected syntax appendix, question about
impoting time defs from rfc3339" on 2010-02-24. I assume it was discussed
and I was just too busy to notice.
--
Philip Jägenstedt
Core Developer
Opera Software
Attachments
- text/html attachment: overview.html
Received on Tuesday, 9 March 2010 08:18:56 UTC