Re: [public-webapps] Comment on Widget URI (2)

Hi Larry,

On Tue, Dec 15, 2009 at 3:52 PM, Robin Berjon <robin@berjon.com> wrote:
> Hi Larry,
>
> On Dec 9, 2009, at 17:55 , Larry Masinter wrote:
>> http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5
>>
>> gives several different examples of normalization and
>> comparison of strings for the purpose of identification.
>
> Yes. That's why we indicate that "A producer MUST generate URIs that are normalised according to chapter 5.3.2. "Syntax-Based Normalization" of [RFC3987]."
>
> RFC 3987 further states that "IRIs already in Unicode MUST NOT be normalized before parsing or interpreting." It goes on to add further details in the rest of 5.3.2.2.
>
>> I can't figure out from the document of the
>> Widget: URI scheme which, if any, of the comparison
>> algorithms are recommended. In fact, the assertion
>> that using UTF-8 is "recommended" seems like it would
>> result in ambiguous interpretation of URIs if some
>> implementations use UTF-8 and others don't.
>
> I'm sorry, I can't find a single location in either the published draft nor the editor's draft that states that for widget URIs UTF-8 is "recommended".
>
> The Widget P+C specification states that it is recommended to use UTF-8 for the file name field of the local file header of a file entry. One may indeed be able to use something else, and user agents may indeed be able to do something with that, but really all bets are off.
>

Zip 6.3 only supports UTF-8 and CP437. When UTF-8 is used, it must be
implicitly marked as such. Hence, you always know which encoding you
are getting:
http://www.pkware.com/documents/casestudies/APPNOTE.TXT

>> So, if I have a file named Voß.html and a relative
>> IRI that points to voss.html, do they match or not?
>> You say "case sensitive", do you mean "byte for byte"?
>> Do half-width romaji characters match the full-width
>> romaji characters?
>
> Does anyone ever really mean byte for byte in string comparisons? Since these IRIs are not normalised, would you prefer "codepoint for codepoint" appended to "case sensitive"? Or am I missing something in your comment?
>
>> Perhaps it's necessary to dig further into the
>> widget spec to insure this is not an ambiguity, but
>> the question was whether the widget specification
>> was "well-defined", and my comment was that it
>> didn't seem to be.
>
> P+C is a separate specification over which WURIs are but a layer, but likewise is indicating codepoint-matching what you are requesting there? Sorry for being thick but it is hard to be certain what the desired outcome from your comment is.
>

Agreed. It might be a help to check the following algorithm in P&C:
http://dev.w3.org/2006/waf/widgets/#rule-for-finding-a-file-within-a-widget-0

And the definition of a zip relative path:
http://dev.w3.org/2006/waf/widgets/#zip-relative-path



-- 
Marcos Caceres
http://datadriven.com.au

Received on Tuesday, 15 December 2009 17:31:08 UTC