Re: Media Fragments URI parsing: pseudo algorithm code from Bjoern Hoehrmann on 2010-07-06 (public-media-fragment@w3.org from July 2010)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 06 Jul 2010 17:33:05 +0200
To: Philip Jägenstedt <philipj@opera.com>
Cc: public-media-fragment@w3.org
Message-ID: <bbi636dnkqkqb1iknepp0sqmv1q9jso532@hive.bjoern.hoehrmann.de>

* Philip Jägenstedt wrote:
>> RFC 3986 requires implementations when processing a fragment identifiers
>> to treat %74 and "t" the same regardless of where either occurs, as "t"
>> is not a reserved character and URIs that differ only in the escaping of
>> unreserved characters are defined to be equivalent. So the answer here
>> is "all components". You can only have special requirements for reserved
>> characters when they occur unescaped.
>
>If I understand this correctly, this means that percent-decoding must be  
>performed on all names and values, which I welcome.
>
>However, given this situation, how is it possible to express parsing in a  
>single layer of ABNF? When the ABNF says "t", it really means "t" or  
>"%74", if these are indeed supposed to be equivalent. How do other specs  
>layered on top of URI handle this?

The only way would be to actually say `%x74 / "%72"` in each of these
cases, which would make the grammar rather unreadable. A workaround
would be to require a pre-processing step that removes escaping for
octets that are not reserved and then work with the result.

>(I think it would be cleaner to split the syntax into two levels -- one  
>that identifies arbitrary name-value pairs, and one that is defined in  
>terms of the Unicode strings that those names/values represent.)

I agree.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Tuesday, 6 July 2010 15:33:49 UTC