Re: Normalization, was: RE: [Widget URI] Internationalization, widget IRI? from Marcos Caceres on 2009-09-16 (public-webapps@w3.org from July to September 2009)

From: Marcos Caceres <marcosc@opera.com>
Date: Wed, 16 Sep 2009 16:08:27 +0200
To: Marcin Hanclik <Marcin.Hanclik@access-company.com>
Cc: Robin Berjon <robin@berjon.com>, public-webapps WG <public-webapps@w3.org>
Message-ID: <b21a10670909160708m21a9f4c9rb1f3ce27a85d1ca7@mail.gmail.com>

On Wed, Sep 16, 2009 at 12:32 PM, Marcin Hanclik
<Marcin.Hanclik@access-company.com> wrote:
> Hi Marcos,
>
>>>So it turns out that %-encoded really just means "replace this '%xx'
>>>with UTF-8 bytes".
> Yes.
>
>>>So we don't need to do anything.
> P&C shall state the actual algorithm and equivalence.
>
> http://www.w3.org/TR/2009/WD-widgets-apis-20090423/
> had this issue:
> "ISSUE: do we need to do some kind of URI normalization to check for equivalency?"
>
> According to RFC3987, 5.1:
> "  Applications using IRIs as identity tokens with no relationship to a
>   protocol MUST use the Simple String Comparison (see section 5.3.1).
>   All other applications MUST select one of the comparison practices
>   from the Comparison Ladder (see section 5.3 or, after IRI-to-URI
>   conversion, select one of the comparison practices from the URI
>   comparison ladder in [RFC3986], section 6.2)"
>
> @href may fall into Comparison Ladder case, id into namespaces.
> The question (still the same) is whether in case of @name of <feature> the IRIs are used as identity tokens (id, simple string) or anything else/new.
>

They are namespaces. I actually raised this issue a long time ago too
because I had the same concerns as you. The WG decided that strings
that name things (@id, @name) are treated as namespaces.

> Once the answer is that IRIs are to be treated as identity tokens (as you propose and I agree), then we still have the issue of expressing the non-ASCII IRIs in ASCII documents (border case). Then we would need a guideline / example that in XML the author shall use character entities to encode the IRI (I marked this solution awkward, but I could live with it).
>

I think Addison already said this was not a problem: if you know the
encoding of the XML document, you know the encoding of the URI. URI
are always treated as UTF-8 internally. There is no problem here.


-- 
Marcos Caceres
http://datadriven.com.au

Received on Wednesday, 16 September 2009 14:09:23 UTC