- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Thu, 11 Apr 2002 18:13:28 +0200
- To: <w3c-i18n-ig@w3.org>, <w3c-rdfcore-wg@w3.org>
I just checked up on what the current draft of charmod and IRI actually say about normalization. It is still IMO a bit confused, because normalization of IRIs is not mentioned explicitly in charmod, although the algorithm given in the IRI spec conforms with the early uniform normalization model of charmod. Charmod: [[[ from section 4.3 http://www.w3.org/TR/charmod/#sec-NormalizationApplication [S] Specifications of text-based formats and protocols MUST, as part of their syntax definition, require that the text be in normalized form. ]]] [[[ Section 8 http://www.w3.org/TR/charmod/#sec-URIs [S] W3C specifications that define protocol or format elements (e.g. HTTP headers, XML attributes, etc.) which are to be interpreted as URI references (or specific subsets of URI references, such as absolute URI references, URIs, etc.) SHOULD use Internationalized Resource Identifiers (IRI) [I-D URI-I18N] (or an appropriate subset thereof). ]]] [[[ IRI Section 2.3 http://www.w3.org/International/2001/draft-masinter-url-i18n-08.txt [[[ Part I is skipped if the input is already in an UCS-based encoding (e.g. UTF-8 or UTF-16). In that case, it is assumed that the IRI is already in NFC. Part I) 1) Represent the IRI characters as a sequence of characters from the UCS. 2) Normalize the character sequence according to Normalization Form C, as defined in [UNI15]. (See further discussion in Section 3.1.) ]]] Charmod, would benefit if the fact that the same normalization model is used in the IRI spec was more explicit. An issue with test003 which IRI raises is that IRI says (particularly (b)): [[[ 2.3 Mapping of IRIs to URIs This section defines how to map an IRI to a URI. Everything in this section applies also to IRI references and URI references, as well as components thereoff (e.g. fragment identifiers). This mapping has two purposes: a) Syntactical: Many URI schemes and components define additional syntactical restrictions not captured in Section 2.2. Such restrictions can be applied to IRIs by noting that IRIs are only valid if they map to syntactically valid URIs. This means that such syntactical restrictions do not have to be defined again on the IRI level. b) Interpretational: URIs identify resources in various ways. IRIs also indentify resources. The resource that an IRI identifies is the same as the one identified by the URI obtained after converting the IRI according to the procedure defined here. This means that there is no need to define the association between identifier and resource again on the IRI level. ]]] This seems to suggest that we should do the mapping before the model theory; which is in tension with the usual refusal to normalize URIs for scheme case, hostname case, port number, missing default path, or anything else, except as part of actually executing the protocol. It is potentially self-inconsistent with the phrase: [[[ However, this mapping SHOULD only be applied when necessary, as late as possible. ]]] Jeremy
Received on Thursday, 11 April 2002 12:06:37 UTC