RE: Normalization, was: RE: [Widget URI] Internationalization, widget IRI?

Hi Marcos,

I will try to do it next week in Dusseldorf then.
Even if an/your implementation handles it, I assume the details should be speced.


Marcin Hanclik
ACCESS Systems Germany GmbH
Tel: +49-208-8290-6452  |  Fax: +49-208-8290-6465
Mobile: +49-163-8290-646

-----Original Message-----
From: [] On Behalf Of Marcos Caceres
Sent: Wednesday, September 16, 2009 4:17 PM
To: Marcin Hanclik
Cc: Robin Berjon; public-webapps WG
Subject: Re: Normalization, was: RE: [Widget URI] Internationalization, widget IRI?

Lets try this another way. Can you make me a widget that explicitly
demos the problem? I will then run it against our implementation and
see what happens.

I will also add the widget to the test suite, to make sure we expose
and potential misunderstandings in the spec wrt URIs.

Kind regards,

On Wed, Sep 16, 2009 at 4:08 PM, Marcos Caceres <> wrote:
> On Wed, Sep 16, 2009 at 12:32 PM, Marcin Hanclik
> <> wrote:
>> Hi Marcos,
>>>>So it turns out that %-encoded really just means "replace this '%xx'
>>>>with UTF-8 bytes".
>> Yes.
>>>>So we don't need to do anything.
>> P&C shall state the actual algorithm and equivalence.

>> had this issue:
>> "ISSUE: do we need to do some kind of URI normalization to check for equivalency?"
>> According to RFC3987, 5.1:
>> "  Applications using IRIs as identity tokens with no relationship to a
>>   protocol MUST use the Simple String Comparison (see section 5.3.1).
>>   All other applications MUST select one of the comparison practices
>>   from the Comparison Ladder (see section 5.3 or, after IRI-to-URI
>>   conversion, select one of the comparison practices from the URI
>>   comparison ladder in [RFC3986], section 6.2)"
>> @href may fall into Comparison Ladder case, id into namespaces.
>> The question (still the same) is whether in case of @name of <feature> the IRIs are used as identity tokens (id, simple string) or anything else/new.
> They are namespaces. I actually raised this issue a long time ago too
> because I had the same concerns as you. The WG decided that strings
> that name things (@id, @name) are treated as namespaces.
>> Once the answer is that IRIs are to be treated as identity tokens (as you propose and I agree), then we still have the issue of expressing the non-ASCII IRIs in ASCII documents (border case). Then we would need a guideline / example that in XML the author shall use character entities to encode the IRI (I marked this solution awkward, but I could live with it).
> I think Addison already said this was not a problem: if you know the
> encoding of the XML document, you know the encoding of the URI. URI
> are always treated as UTF-8 internally. There is no problem here.
> --
> Marcos Caceres


Marcos Caceres


Access Systems Germany GmbH
Essener Strasse 5  |  D-46047 Oberhausen
HRB 13548 Amtsgericht Duisburg
Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda

This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.

Received on Wednesday, 16 September 2009 14:31:47 UTC