Re: charmod-uri

At 18:13 02/04/11 +0200, Jeremy Carroll wrote:

>I just checked up on what the current draft of charmod and IRI actually say
>about normalization. It is still IMO a bit confused, because normalization
>of IRIs is not mentioned explicitly in charmod,

Hello Jeremy,

Have you looked at the latest charmod draft?
(http://www.w3.org/International/Group/charmod-edit/#sec-URIs)
I'm not sure normalization of IRIs has to be mentioned in
the character model; it would only complicate the specs.


>although the algorithm given
>in the IRI spec conforms with the early uniform normalization model of
>charmod.

Of course it does :-). But maybe it should say so.
(well, there are a few details we have to look at again)


>Charmod:
>
>[[[ from section 4.3
>http://www.w3.org/TR/charmod/#sec-NormalizationApplication
>
>[S] Specifications of text-based formats and protocols MUST, as part of
>their syntax definition, require that the text be in normalized form.
>
>]]]
>
>[[[ Section 8
>http://www.w3.org/TR/charmod/#sec-URIs
>
>[S] W3C specifications that define protocol or format elements (e.g. HTTP
>headers, XML attributes, etc.) which are to be interpreted as URI references
>(or specific subsets of URI references, such as absolute URI references,
>URIs, etc.) SHOULD use Internationalized Resource Identifiers (IRI) [I-D
>URI-I18N] (or an appropriate subset thereof).
>
>]]]
>
>[[[ IRI Section 2.3
>http://www.w3.org/International/2001/draft-masinter-url-i18n-08.txt
>
>[[[
>Part I is skipped if the
>input is already in an UCS-based encoding (e.g. UTF-8 or UTF-16). In
>that case, it is assumed that the IRI is already in NFC.
>
>    Part I)
>
>    1) Represent the IRI characters as a sequence of characters from the
>       UCS.
>
>    2) Normalize the character sequence according to Normalization Form
>       C, as defined in [UNI15].  (See further discussion in Section
>       3.1.)
>]]]
>
>Charmod, would benefit if the fact that the same normalization model is used
>in the IRI spec was more explicit.

I agree to make this explicit in the IRI spec. I don't see
the point for Charmod.


>An issue with test003 which IRI raises is that IRI says (particularly (b)):
>
>[[[
>2.3 Mapping of IRIs to URIs
>
>This section defines how to map an IRI to a URI. Everything in
>this section applies also to IRI references and URI references, as
>well as components thereoff (e.g. fragment identifiers).
>
>This mapping has two purposes:
>
>   a) Syntactical: Many URI schemes and components define additional
>      syntactical restrictions not captured in Section 2.2. Such
>      restrictions can be applied to IRIs by noting that IRIs are only
>      valid if they map to syntactically valid URIs. This means that
>      such syntactical restrictions do not have to be defined again
>      on the IRI level.
>
>   b) Interpretational: URIs identify resources in various ways. IRIs
>      also indentify resources. The resource that an IRI identifies is
>      the same as the one identified by the URI obtained after
>      converting the IRI according to the procedure defined here.
>      This means that there is no need to define the association
>      between identifier and resource again on the IRI level.
>]]]
>
>This seems to suggest that we should do the mapping before the model theory;
>which is in tension with the usual refusal to normalize URIs for scheme
>case, hostname case, port number, missing default path, or anything else,
>except as part of actually executing the protocol.

Clearing up escape issues is one step before casing issues.
Most escape issues (for a-zA-Z0-9, everything outside US-ASCII,
plus a few specials) are completely independent of the scheme,
they apply to all URIs. Case and the other stuff is very much
scheme-dependent. This is a big difference.


>It is potentially self-inconsistent with the phrase:
>
>[[[
>However, this mapping SHOULD only be applied when necessary, as late
>as possible.
>]]]

No, it is not. For RDF, it would just mean that when you compare,
you may want to apply it, but you wouldn't convert and stay there;
you would keep the original.

Regards,   Martin.

Received on Monday, 15 April 2002 21:29:37 UTC