RE: [public-webapps] Comment on Widget URI (2) from Larry Masinter on 2009-12-09 (public-webapps@w3.org from October to December 2009)

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 9 Dec 2009 08:55:48 -0800
To: Robin Berjon <robin@berjon.com>
CC: "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4D0B0A69@nambxv01a.corp.adobe.com>

http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-5

gives several different examples of normalization and
comparison of strings for the purpose of identification.

There are significant differences in alternatives for
how to do comparison of Unicode file names.

I can't figure out from the document of the
Widget: URI scheme which, if any, of the comparison
algorithms are recommended. In fact, the assertion
that using UTF-8 is "recommended" seems like it would
result in ambiguous interpretation of URIs if some
implementations use UTF-8 and others don't.

So, if I have a file named Voß.html and a relative
IRI that points to voss.html, do they match or not?
You say "case sensitive", do you mean "byte for byte"?
Do half-width romaji characters match the full-width
romaji characters?

Note that different operating systems normalize
unicode file names differently.

Perhaps it's necessary to dig further into the
widget spec to insure this is not an ambiguity, but
the question was whether the widget specification
was "well-defined", and my comment was that it
didn't seem to be.

Larry
--
http://larry.masinter.net

-----Original Message-----
From: Robin Berjon [mailto:robin@berjon.com] 
Sent: Thursday, November 19, 2009 6:00 AM
To: Larry Masinter
Cc: public-webapps@w3.org
Subject: Re: [public-webapps] Comment on Widget URI (2)

Dear Larry,

thank you for your comments.

On Oct 10, 2009, at 19:44 , Larry Masinter wrote:
> 2) ** WELL-DEFINED MAPPING TO FILES **
> 
> Section 4.4 Step 2 makes normative reference:
> 
> http://www.w3.org/TR/widgets/#rule-for-finding-a-file-within-a-widget- 
> 
> The algorithm there seems to be lacking a clear definition of "matches"
> which deals reasonably with the issues surrounding matching and equivalence
> for Unicode strings, or the handling of character sets in IRIs which are
> not represented in UTF8.
> 
> Suggestion (Editorial): Move the definition of the mapping algorithm
> into the URI scheme registration document so that its definition can 
> be reviewed for completeness.
> Suggestion (Technical): Define exactly and precisely what "match" means
> and make it clear what the appropriate response or error conditions are
> if there is more than one file that "match"es.

This comment concerns P+C, and I'm unsure about what change you are requesting where. Could you please provide an example of an issue in the current setup and explain how you would like to see it addressed?

-- 
Robin Berjon - http://berjon.com/

Received on Wednesday, 9 December 2009 16:56:32 UTC