Re: Media Fragments URI parsing: pseudo algorithm code

On Wed, 30 Jun 2010 22:23:51 +0200, Bjoern Hoehrmann <derhoermi@gmx.net>  
wrote:

> * Philip Jägenstedt wrote:
>>>>> With the current grammar, it is allowed only in track and id
>>>>> productions.
>>>>> So it is perfectly compatible with the processing defined in rfc3986
>>>>> and perfectly allows #track=A%20%26%20B&t=10
>>>>
>>>> No disagreement that we need to define it, thankfully. The  
>>>> disagreement
>>>> is only where to decode percent-encoding.
>>>
>>> RFC3986 gives the answer, after the URI components are parsed (and we
>>> define here how to split out in components).
>>
>> The disagreement here is only for which components to decode
>> percent-encoding, RFC3986 will not help us.
>
> RFC 3986 requires implementations when processing a fragment identifiers
> to treat %74 and "t" the same regardless of where either occurs, as "t"
> is not a reserved character and URIs that differ only in the escaping of
> unreserved characters are defined to be equivalent. So the answer here
> is "all components". You can only have special requirements for reserved
> characters when they occur unescaped.

If I understand this correctly, this means that percent-decoding must be  
performed on all names and values, which I welcome.

However, given this situation, how is it possible to express parsing in a  
single layer of ABNF? When the ABNF says "t", it really means "t" or  
"%74", if these are indeed supposed to be equivalent. How do other specs  
layered on top of URI handle this?

(I think it would be cleaner to split the syntax into two levels -- one  
that identifies arbitrary name-value pairs, and one that is defined in  
terms of the Unicode strings that those names/values represent.)

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Tuesday, 6 July 2010 14:35:56 UTC