- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 16 Apr 2002 10:29:10 +0900
- To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-i18n-ig@w3.org>, <w3c-rdfcore-wg@w3.org>
At 18:13 02/04/11 +0200, Jeremy Carroll wrote: >I just checked up on what the current draft of charmod and IRI actually say >about normalization. It is still IMO a bit confused, because normalization >of IRIs is not mentioned explicitly in charmod, Hello Jeremy, Have you looked at the latest charmod draft? (http://www.w3.org/International/Group/charmod-edit/#sec-URIs) I'm not sure normalization of IRIs has to be mentioned in the character model; it would only complicate the specs. >although the algorithm given >in the IRI spec conforms with the early uniform normalization model of >charmod. Of course it does :-). But maybe it should say so. (well, there are a few details we have to look at again) >Charmod: > >[[[ from section 4.3 >http://www.w3.org/TR/charmod/#sec-NormalizationApplication > >[S] Specifications of text-based formats and protocols MUST, as part of >their syntax definition, require that the text be in normalized form. > >]]] > >[[[ Section 8 >http://www.w3.org/TR/charmod/#sec-URIs > >[S] W3C specifications that define protocol or format elements (e.g. HTTP >headers, XML attributes, etc.) which are to be interpreted as URI references >(or specific subsets of URI references, such as absolute URI references, >URIs, etc.) SHOULD use Internationalized Resource Identifiers (IRI) [I-D >URI-I18N] (or an appropriate subset thereof). > >]]] > >[[[ IRI Section 2.3 >http://www.w3.org/International/2001/draft-masinter-url-i18n-08.txt > >[[[ >Part I is skipped if the >input is already in an UCS-based encoding (e.g. UTF-8 or UTF-16). In >that case, it is assumed that the IRI is already in NFC. > > Part I) > > 1) Represent the IRI characters as a sequence of characters from the > UCS. > > 2) Normalize the character sequence according to Normalization Form > C, as defined in [UNI15]. (See further discussion in Section > 3.1.) >]]] > >Charmod, would benefit if the fact that the same normalization model is used >in the IRI spec was more explicit. I agree to make this explicit in the IRI spec. I don't see the point for Charmod. >An issue with test003 which IRI raises is that IRI says (particularly (b)): > >[[[ >2.3 Mapping of IRIs to URIs > >This section defines how to map an IRI to a URI. Everything in >this section applies also to IRI references and URI references, as >well as components thereoff (e.g. fragment identifiers). > >This mapping has two purposes: > > a) Syntactical: Many URI schemes and components define additional > syntactical restrictions not captured in Section 2.2. Such > restrictions can be applied to IRIs by noting that IRIs are only > valid if they map to syntactically valid URIs. This means that > such syntactical restrictions do not have to be defined again > on the IRI level. > > b) Interpretational: URIs identify resources in various ways. IRIs > also indentify resources. The resource that an IRI identifies is > the same as the one identified by the URI obtained after > converting the IRI according to the procedure defined here. > This means that there is no need to define the association > between identifier and resource again on the IRI level. >]]] > >This seems to suggest that we should do the mapping before the model theory; >which is in tension with the usual refusal to normalize URIs for scheme >case, hostname case, port number, missing default path, or anything else, >except as part of actually executing the protocol. Clearing up escape issues is one step before casing issues. Most escape issues (for a-zA-Z0-9, everything outside US-ASCII, plus a few specials) are completely independent of the scheme, they apply to all URIs. Case and the other stuff is very much scheme-dependent. This is a big difference. >It is potentially self-inconsistent with the phrase: > >[[[ >However, this mapping SHOULD only be applied when necessary, as late >as possible. >]]] No, it is not. For RDF, it would just mean that when you compare, you may want to apply it, but you wouldn't convert and stay there; you would keep the original. Regards, Martin.
Received on Monday, 15 April 2002 21:29:37 UTC