W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > April 2002

Re: charmod-uri

From: Martin Duerst <duerst@w3.org>
Date: Tue, 16 Apr 2002 10:29:10 +0900
Message-Id: <>
To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-i18n-ig@w3.org>, <w3c-rdfcore-wg@w3.org>
At 18:13 02/04/11 +0200, Jeremy Carroll wrote:

>I just checked up on what the current draft of charmod and IRI actually say
>about normalization. It is still IMO a bit confused, because normalization
>of IRIs is not mentioned explicitly in charmod,

Hello Jeremy,

Have you looked at the latest charmod draft?
I'm not sure normalization of IRIs has to be mentioned in
the character model; it would only complicate the specs.

>although the algorithm given
>in the IRI spec conforms with the early uniform normalization model of

Of course it does :-). But maybe it should say so.
(well, there are a few details we have to look at again)

>[[[ from section 4.3
>[S] Specifications of text-based formats and protocols MUST, as part of
>their syntax definition, require that the text be in normalized form.
>[[[ Section 8
>[S] W3C specifications that define protocol or format elements (e.g. HTTP
>headers, XML attributes, etc.) which are to be interpreted as URI references
>(or specific subsets of URI references, such as absolute URI references,
>URIs, etc.) SHOULD use Internationalized Resource Identifiers (IRI) [I-D
>URI-I18N] (or an appropriate subset thereof).
>[[[ IRI Section 2.3
>Part I is skipped if the
>input is already in an UCS-based encoding (e.g. UTF-8 or UTF-16). In
>that case, it is assumed that the IRI is already in NFC.
>    Part I)
>    1) Represent the IRI characters as a sequence of characters from the
>       UCS.
>    2) Normalize the character sequence according to Normalization Form
>       C, as defined in [UNI15].  (See further discussion in Section
>       3.1.)
>Charmod, would benefit if the fact that the same normalization model is used
>in the IRI spec was more explicit.

I agree to make this explicit in the IRI spec. I don't see
the point for Charmod.

>An issue with test003 which IRI raises is that IRI says (particularly (b)):
>2.3 Mapping of IRIs to URIs
>This section defines how to map an IRI to a URI. Everything in
>this section applies also to IRI references and URI references, as
>well as components thereoff (e.g. fragment identifiers).
>This mapping has two purposes:
>   a) Syntactical: Many URI schemes and components define additional
>      syntactical restrictions not captured in Section 2.2. Such
>      restrictions can be applied to IRIs by noting that IRIs are only
>      valid if they map to syntactically valid URIs. This means that
>      such syntactical restrictions do not have to be defined again
>      on the IRI level.
>   b) Interpretational: URIs identify resources in various ways. IRIs
>      also indentify resources. The resource that an IRI identifies is
>      the same as the one identified by the URI obtained after
>      converting the IRI according to the procedure defined here.
>      This means that there is no need to define the association
>      between identifier and resource again on the IRI level.
>This seems to suggest that we should do the mapping before the model theory;
>which is in tension with the usual refusal to normalize URIs for scheme
>case, hostname case, port number, missing default path, or anything else,
>except as part of actually executing the protocol.

Clearing up escape issues is one step before casing issues.
Most escape issues (for a-zA-Z0-9, everything outside US-ASCII,
plus a few specials) are completely independent of the scheme,
they apply to all URIs. Case and the other stuff is very much
scheme-dependent. This is a big difference.

>It is potentially self-inconsistent with the phrase:
>However, this mapping SHOULD only be applied when necessary, as late
>as possible.

No, it is not. For RDF, it would just mean that when you compare,
you may want to apply it, but you wouldn't convert and stay there;
you would keep the original.

Regards,   Martin.
Received on Monday, 15 April 2002 21:29:37 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:53:57 UTC