W3C home > Mailing lists > Public > public-webapps@w3.org > October to December 2009

Re: [public-webapps] Comment on Widget URI (2)

From: Marcos Caceres <marcosc@opera.com>
Date: Tue, 15 Dec 2009 18:30:09 +0100
Message-ID: <b21a10670912150930h48a0d7dbu29b561b04accfb3d@mail.gmail.com>
To: Larry Masinter <masinter@adobe.com>
Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Hi Larry,

On Tue, Dec 15, 2009 at 3:52 PM, Robin Berjon <robin@berjon.com> wrote:
> Hi Larry,
> On Dec 9, 2009, at 17:55 , Larry Masinter wrote:
>> http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5
>> gives several different examples of normalization and
>> comparison of strings for the purpose of identification.
> Yes. That's why we indicate that "A producer MUST generate URIs that are normalised according to chapter 5.3.2. "Syntax-Based Normalization" of [RFC3987]."
> RFC 3987 further states that "IRIs already in Unicode MUST NOT be normalized before parsing or interpreting." It goes on to add further details in the rest of
>> I can't figure out from the document of the
>> Widget: URI scheme which, if any, of the comparison
>> algorithms are recommended. In fact, the assertion
>> that using UTF-8 is "recommended" seems like it would
>> result in ambiguous interpretation of URIs if some
>> implementations use UTF-8 and others don't.
> I'm sorry, I can't find a single location in either the published draft nor the editor's draft that states that for widget URIs UTF-8 is "recommended".
> The Widget P+C specification states that it is recommended to use UTF-8 for the file name field of the local file header of a file entry. One may indeed be able to use something else, and user agents may indeed be able to do something with that, but really all bets are off.

Zip 6.3 only supports UTF-8 and CP437. When UTF-8 is used, it must be
implicitly marked as such. Hence, you always know which encoding you
are getting:

>> So, if I have a file named Voß.html and a relative
>> IRI that points to voss.html, do they match or not?
>> You say "case sensitive", do you mean "byte for byte"?
>> Do half-width romaji characters match the full-width
>> romaji characters?
> Does anyone ever really mean byte for byte in string comparisons? Since these IRIs are not normalised, would you prefer "codepoint for codepoint" appended to "case sensitive"? Or am I missing something in your comment?
>> Perhaps it's necessary to dig further into the
>> widget spec to insure this is not an ambiguity, but
>> the question was whether the widget specification
>> was "well-defined", and my comment was that it
>> didn't seem to be.
> P+C is a separate specification over which WURIs are but a layer, but likewise is indicating codepoint-matching what you are requesting there? Sorry for being thick but it is hard to be certain what the desired outcome from your comment is.

Agreed. It might be a help to check the following algorithm in P&C:

And the definition of a zip relative path:

Marcos Caceres
Received on Tuesday, 15 December 2009 17:31:08 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 11 February 2015 14:36:40 UTC