W3C home > Mailing lists > Public > public-media-fragment@w3.org > July 2010

Re: Media Fragments URI parsing: pseudo algorithm code

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Thu, 1 Jul 2010 23:02:45 +1000
Message-ID: <AANLkTineCsVU1ABr4FS4KPxkVGliDfSvFBz6ZoPt-1PD@mail.gmail.com>
To: Yves Lafon <ylafon@w3.org>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, Philip Jägenstedt <philipj@opera.com>, public-media-fragment@w3.org
On Thu, Jul 1, 2010 at 10:47 PM, Yves Lafon <ylafon@w3.org> wrote:
> On Thu, 1 Jul 2010, Bjoern Hoehrmann wrote:
>
>> * Yves Lafon wrote:
>>>
>>> On Wed, 30 Jun 2010, Bjoern Hoehrmann wrote:
>>>
>>>>> The disagreement here is only for which components to decode
>>>>> percent-encoding, RFC3986 will not help us.
>>>>
>>>> RFC 3986 requires implementations when processing a fragment identifiers
>>>> to treat %74 and "t" the same regardless of where either occurs, as "t"
>>>> is not a reserved character and URIs that differ only in the escaping of
>>>> unreserved characters are defined to be equivalent. So the answer here
>>>> is "all components". You can only have special requirements for reserved
>>>> characters when they occur unescaped.
>>>
>>> URI equivalence is an endlees source of fun :)
>>> are http://www.example.com/ (1) and http://www.example.com:80/ (2) and
>>> h%74ttp:www/example.com/ (3) equivalent ?
>>> From what you say, at least (1) and (3) should be.
>>
>> Well, http://www.websitedev.de/temp/rfc3986-check.html.gz tells me (3)
>> is neither a URI nor a URI-reference so the question does not arise. For
>> (1) and (2) the answer is scheme-specific. Neither has a bearing on the
>> case of fragment identifiers as they are scheme-independent and allow
>> percent-encoding everywhere.
>
> (3) is not a URI because the ABNF doesn't allow percent encoding in the
> scheme.
> But rfc3986 2.4.  When to Encode or Decode says:
> <<
> When a URI is dereferenced, the components and subcomponents
>   significant to the scheme-specific dereferencing process (if any)
>   must be parsed and separated before the percent-encoded octets within
>   those components can be safely decoded, as otherwise the data may be
>   mistaken for component delimiters.
>>>
> So far so good.
> <<
> The only exception is for
>   percent-encoded octets corresponding to characters in the unreserved
>   set, which can be decoded at any time.
>>>
> which is what you are referring to contradicts the fact that
> h%74tp:www/example.com/ is not a valid URI
>
>

I assume you are working on the basis that the name-value pairs that
we define fall under the general understanding of sub-components in
rfc3986? It can't be components, since they are defined in section 3
as Scheme, Path, Quer, and Fragment. I further assume that because we
use "=" as a subdelimiter, which is a reserved character, you regard
the name and value as a sub-component, as described in 2.2?

I think under these circumstances, it may indeed already be defined
what needs to be percent-encoded and what not...

However, I fail to see how h%74tp:www/example.com/ could ever be a
valid URI, even given these circumstances.

Cheers,
Silvia.
Received on Thursday, 1 July 2010 13:03:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 21 September 2011 12:13:39 GMT