W3C home > Mailing lists > Public > public-webapps@w3.org > October to December 2009

Re: [public-webapps] Comment on Widget URI (2)

From: Robin Berjon <robin@berjon.com>
Date: Tue, 15 Dec 2009 15:52:53 +0100
Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Message-Id: <628A0603-A60E-49EB-B575-B279B623E9A7@berjon.com>
To: Larry Masinter <masinter@adobe.com>
Hi Larry,

On Dec 9, 2009, at 17:55 , Larry Masinter wrote:
> http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5
> 
> gives several different examples of normalization and
> comparison of strings for the purpose of identification.

Yes. That's why we indicate that "A producer MUST generate URIs that are normalised according to chapter 5.3.2. "Syntax-Based Normalization" of [RFC3987]."

RFC 3987 further states that "IRIs already in Unicode MUST NOT be normalized before parsing or interpreting." It goes on to add further details in the rest of 5.3.2.2.

> I can't figure out from the document of the
> Widget: URI scheme which, if any, of the comparison
> algorithms are recommended. In fact, the assertion
> that using UTF-8 is "recommended" seems like it would
> result in ambiguous interpretation of URIs if some
> implementations use UTF-8 and others don't.

I'm sorry, I can't find a single location in either the published draft nor the editor's draft that states that for widget URIs UTF-8 is "recommended".

The Widget P+C specification states that it is recommended to use UTF-8 for the file name field of the local file header of a file entry. One may indeed be able to use something else, and user agents may indeed be able to do something with that, but really all bets are off.

> So, if I have a file named Voß.html and a relative
> IRI that points to voss.html, do they match or not?
> You say "case sensitive", do you mean "byte for byte"?
> Do half-width romaji characters match the full-width
> romaji characters?

Does anyone ever really mean byte for byte in string comparisons? Since these IRIs are not normalised, would you prefer "codepoint for codepoint" appended to "case sensitive"? Or am I missing something in your comment?

> Perhaps it's necessary to dig further into the
> widget spec to insure this is not an ambiguity, but
> the question was whether the widget specification
> was "well-defined", and my comment was that it
> didn't seem to be.

P+C is a separate specification over which WURIs are but a layer, but likewise is indicating codepoint-matching what you are requesting there? Sorry for being thick but it is hard to be certain what the desired outcome from your comment is.

-- 
Robin Berjon - http://berjon.com/
Received on Tuesday, 15 December 2009 14:53:35 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:35 GMT