- From: Marcos Caceres <marcosc@opera.com>
- Date: Tue, 15 Dec 2009 18:30:09 +0100
- To: Larry Masinter <masinter@adobe.com>
- Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Hi Larry, On Tue, Dec 15, 2009 at 3:52 PM, Robin Berjon <robin@berjon.com> wrote: > Hi Larry, > > On Dec 9, 2009, at 17:55 , Larry Masinter wrote: >> http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5 >> >> gives several different examples of normalization and >> comparison of strings for the purpose of identification. > > Yes. That's why we indicate that "A producer MUST generate URIs that are normalised according to chapter 5.3.2. "Syntax-Based Normalization" of [RFC3987]." > > RFC 3987 further states that "IRIs already in Unicode MUST NOT be normalized before parsing or interpreting." It goes on to add further details in the rest of 5.3.2.2. > >> I can't figure out from the document of the >> Widget: URI scheme which, if any, of the comparison >> algorithms are recommended. In fact, the assertion >> that using UTF-8 is "recommended" seems like it would >> result in ambiguous interpretation of URIs if some >> implementations use UTF-8 and others don't. > > I'm sorry, I can't find a single location in either the published draft nor the editor's draft that states that for widget URIs UTF-8 is "recommended". > > The Widget P+C specification states that it is recommended to use UTF-8 for the file name field of the local file header of a file entry. One may indeed be able to use something else, and user agents may indeed be able to do something with that, but really all bets are off. > Zip 6.3 only supports UTF-8 and CP437. When UTF-8 is used, it must be implicitly marked as such. Hence, you always know which encoding you are getting: http://www.pkware.com/documents/casestudies/APPNOTE.TXT >> So, if I have a file named Voß.html and a relative >> IRI that points to voss.html, do they match or not? >> You say "case sensitive", do you mean "byte for byte"? >> Do half-width romaji characters match the full-width >> romaji characters? > > Does anyone ever really mean byte for byte in string comparisons? Since these IRIs are not normalised, would you prefer "codepoint for codepoint" appended to "case sensitive"? Or am I missing something in your comment? > >> Perhaps it's necessary to dig further into the >> widget spec to insure this is not an ambiguity, but >> the question was whether the widget specification >> was "well-defined", and my comment was that it >> didn't seem to be. > > P+C is a separate specification over which WURIs are but a layer, but likewise is indicating codepoint-matching what you are requesting there? Sorry for being thick but it is hard to be certain what the desired outcome from your comment is. > Agreed. It might be a help to check the following algorithm in P&C: http://dev.w3.org/2006/waf/widgets/#rule-for-finding-a-file-within-a-widget-0 And the definition of a zip relative path: http://dev.w3.org/2006/waf/widgets/#zip-relative-path -- Marcos Caceres http://datadriven.com.au
Received on Tuesday, 15 December 2009 17:31:08 UTC