Re: Normalization, was: RE: [Widget URI] Internationalization, widget IRI?

Lets try this another way. Can you make me a widget that explicitly
demos the problem? I will then run it against our implementation and
see what happens.

I will also add the widget to the test suite, to make sure we expose
and potential misunderstandings in the spec wrt URIs.

Kind regards,

On Wed, Sep 16, 2009 at 4:08 PM, Marcos Caceres <> wrote:
> On Wed, Sep 16, 2009 at 12:32 PM, Marcin Hanclik
> <> wrote:
>> Hi Marcos,
>>>>So it turns out that %-encoded really just means "replace this '%xx'
>>>>with UTF-8 bytes".
>> Yes.
>>>>So we don't need to do anything.
>> P&C shall state the actual algorithm and equivalence.
>> had this issue:
>> "ISSUE: do we need to do some kind of URI normalization to check for equivalency?"
>> According to RFC3987, 5.1:
>> "  Applications using IRIs as identity tokens with no relationship to a
>>   protocol MUST use the Simple String Comparison (see section 5.3.1).
>>   All other applications MUST select one of the comparison practices
>>   from the Comparison Ladder (see section 5.3 or, after IRI-to-URI
>>   conversion, select one of the comparison practices from the URI
>>   comparison ladder in [RFC3986], section 6.2)"
>> @href may fall into Comparison Ladder case, id into namespaces.
>> The question (still the same) is whether in case of @name of <feature> the IRIs are used as identity tokens (id, simple string) or anything else/new.
> They are namespaces. I actually raised this issue a long time ago too
> because I had the same concerns as you. The WG decided that strings
> that name things (@id, @name) are treated as namespaces.
>> Once the answer is that IRIs are to be treated as identity tokens (as you propose and I agree), then we still have the issue of expressing the non-ASCII IRIs in ASCII documents (border case). Then we would need a guideline / example that in XML the author shall use character entities to encode the IRI (I marked this solution awkward, but I could live with it).
> I think Addison already said this was not a problem: if you know the
> encoding of the XML document, you know the encoding of the URI. URI
> are always treated as UTF-8 internally. There is no problem here.
> --
> Marcos Caceres

Marcos Caceres

Received on Wednesday, 16 September 2009 14:18:11 UTC