Re: Normalization, was: RE: [Widget URI] Internationalization, widget IRI?

Marcin,
Lets try this another way. Can you make me a widget that explicitly
demos the problem? I will then run it against our implementation and
see what happens.

I will also add the widget to the test suite, to make sure we expose
and potential misunderstandings in the spec wrt URIs.

Kind regards,
Marcos

On Wed, Sep 16, 2009 at 4:08 PM, Marcos Caceres <marcosc@opera.com> wrote:
> On Wed, Sep 16, 2009 at 12:32 PM, Marcin Hanclik
> <Marcin.Hanclik@access-company.com> wrote:
>> Hi Marcos,
>>
>>>>So it turns out that %-encoded really just means "replace this '%xx'
>>>>with UTF-8 bytes".
>> Yes.
>>
>>>>So we don't need to do anything.
>> P&C shall state the actual algorithm and equivalence.
>>
>> http://www.w3.org/TR/2009/WD-widgets-apis-20090423/
>> had this issue:
>> "ISSUE: do we need to do some kind of URI normalization to check for equivalency?"
>>
>> According to RFC3987, 5.1:
>> "  Applications using IRIs as identity tokens with no relationship to a
>>   protocol MUST use the Simple String Comparison (see section 5.3.1).
>>   All other applications MUST select one of the comparison practices
>>   from the Comparison Ladder (see section 5.3 or, after IRI-to-URI
>>   conversion, select one of the comparison practices from the URI
>>   comparison ladder in [RFC3986], section 6.2)"
>>
>> @href may fall into Comparison Ladder case, id into namespaces.
>> The question (still the same) is whether in case of @name of <feature> the IRIs are used as identity tokens (id, simple string) or anything else/new.
>>
>
> They are namespaces. I actually raised this issue a long time ago too
> because I had the same concerns as you. The WG decided that strings
> that name things (@id, @name) are treated as namespaces.
>
>> Once the answer is that IRIs are to be treated as identity tokens (as you propose and I agree), then we still have the issue of expressing the non-ASCII IRIs in ASCII documents (border case). Then we would need a guideline / example that in XML the author shall use character entities to encode the IRI (I marked this solution awkward, but I could live with it).
>>
>
> I think Addison already said this was not a problem: if you know the
> encoding of the XML document, you know the encoding of the URI. URI
> are always treated as UTF-8 internally. There is no problem here.
>
>
> --
> Marcos Caceres
> http://datadriven.com.au
>



-- 
Marcos Caceres
http://datadriven.com.au

Received on Wednesday, 16 September 2009 14:18:11 UTC