RE: Normalization, was: RE: [Widget URI] Internationalization, widget IRI? from Marcin Hanclik on 2009-09-16 (public-webapps@w3.org from July to September 2009)

From: Marcin Hanclik <Marcin.Hanclik@access-company.com>
Date: Wed, 16 Sep 2009 12:32:17 +0200
To: "marcosc@opera.com" <marcosc@opera.com>
CC: Robin Berjon <robin@berjon.com>, public-webapps WG <public-webapps@w3.org>
Message-ID: <FAA1D89C5BAF1142A74AF116630A9F2C2890C66293@OBEEX01.obe.access-company.com>

Hi Marcos,

>>So it turns out that %-encoded really just means "replace this '%xx'
>>with UTF-8 bytes".
Yes.

>>So we don't need to do anything.
P&C shall state the actual algorithm and equivalence.

http://www.w3.org/TR/2009/WD-widgets-apis-20090423/

had this issue:
"ISSUE: do we need to do some kind of URI normalization to check for equivalency?"

According to RFC3987, 5.1:
"  Applications using IRIs as identity tokens with no relationship to a
   protocol MUST use the Simple String Comparison (see section 5.3.1).
   All other applications MUST select one of the comparison practices
   from the Comparison Ladder (see section 5.3 or, after IRI-to-URI
   conversion, select one of the comparison practices from the URI
   comparison ladder in [RFC3986], section 6.2)"

@href may fall into Comparison Ladder case, id into namespaces.
The question (still the same) is whether in case of @name of <feature> the IRIs are used as identity tokens (id, simple string) or anything else/new.
Once the answer is that IRIs are to be treated as identity tokens (as you propose and I agree), then we still have the issue of expressing the non-ASCII IRIs in ASCII documents (border case). Then we would need a guideline / example that in XML the author shall use character entities to encode the IRI (I marked this solution awkward, but I could live with it).

Thanks,
Marcin

Marcin Hanclik
ACCESS Systems Germany GmbH
Tel: +49-208-8290-6452  |  Fax: +49-208-8290-6465
Mobile: +49-163-8290-646
E-Mail: marcin.hanclik@access-company.com

-----Original Message-----
From: marcosscaceres@gmail.com [mailto:marcosscaceres@gmail.com] On Behalf Of Marcos Caceres
Sent: Thursday, September 10, 2009 2:28 PM
To: Marcin Hanclik
Cc: Robin Berjon; public-webapps WG
Subject: Re: Normalization, was: RE: [Widget URI] Internationalization, widget IRI?

On Tue, Sep 8, 2009 at 5:47 PM, Marcos Caceres<marcosc@opera.com> wrote:
>
>
> Marcin Hanclik wrote:
>>
>> Hi Marcos,
>>
>> Thanks for your comments.
>> It seems we are aligned.
>>
>>>> UAs will just have to deal with that internally
>>
>> I assume there could be an easy solution to the normalization:
>> The normalization / mandating some equivalence-determining algorithm (from
>> RFC3987) could go into P&C.
>> Then maybe I18N would not have to be bothered further.
>
> Ok, basically, to address this all we need to say is:
>
> If 'potential license href' is a valid IRI, then let 'widget license href'
> be the result of normalizing 'potential license href' in accordance with
> RFC3987.
>
> Same for author href.
>
> (Personally, I think this change is unnecessary because of the fact that
> implementers know these are IRIs already).

So it turns out that %-encoded really just means "replace this '%xx'
with UTF-8 bytes". So we don't need to do anything.

--
Marcos Caceres
http://datadriven.com.au

________________________________________

Access Systems Germany GmbH
Essener Strasse 5  |  D-46047 Oberhausen
HRB 13548 Amtsgericht Duisburg
Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda

www.access-company.com

CONFIDENTIALITY NOTICE
This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.

Received on Wednesday, 16 September 2009 10:33:28 UTC