- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Tue, 16 Apr 2002 15:33:53 +0100
- To: "Martin Duerst" <duerst@w3.org>, <w3c-rdfcore-wg@w3.org>
- Cc: <w3c-i18n-ig@w3.org>
For RDF Core, a significant part of Martin's comments is the last paragraph below. In RDF terms I read it as advocating that the labels of the nodes in the RDF graph are US-ASCII URIs not IRIs (although implementations should maintain the original character sequence). So far I've heard: - Jeremy, maybe Dan, maybe Larry, in favour of using "IRIs" (at least original character sequences) as the labels on the RDF graph - Martin, somehow Aaron as in favour of using US-ASCII URIs as the labels on the RDF graph. This is characerised by test 003 in http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Apr/att-0116/01-charm od-uri.htm Denying test003 is agreeing that labels are US ASCII URIs. Equivalently that is that http://example.org/#Andr%C3%A9 and http://example.org/#Andre are completely equivalent and interchangeable. The only observable difference in behaviour that may be expected is the exact form used by a viewer or output device. I am happy to go either way, although I find it difficult to see how to state the normal form constraint that I think is important. Consider the US ASCII URI http://example.org/#Andre%CC%81 where %CC%81 is the UTF-8 encoding of character #x301 the combining acute accent. This is, like any US ASCII URI, in Normal Form C, despite the UTF-8 original character sequence not being in NFC. I don't think I can support or propose that we prohibit those US ASCII URIs that when viewed as a UTF-8 encoded original characeter sequence correspond to original character sequences that are not in NFC. I think a "NOTE: " somewhere about this would be close to incomprehensible. Going down this path, in my view, limits us to saying non-normative things about RDF platforms not using OCS's which are not NFC, but using the corresponding US ASCII URI instead. I could write a non-normative appendix to our syntax spec that said all this. We could modify some of the fraud examples to show legal test cases that used the US-ASCII form of the problematic IRIs. Jeremy Jeremy: > >[[[ > >2.3 Mapping of IRIs to URIs > > > >This section defines how to map an IRI to a URI. Everything in > >this section applies also to IRI references and URI references, as > >well as components thereoff (e.g. fragment identifiers). > > > >This mapping has two purposes: > > > > a) Syntactical: Many URI schemes and components define additional > > syntactical restrictions not captured in Section 2.2. Such > > restrictions can be applied to IRIs by noting that IRIs are only > > valid if they map to syntactically valid URIs. This means that > > such syntactical restrictions do not have to be defined again > > on the IRI level. > > > > b) Interpretational: URIs identify resources in various ways. IRIs > > also indentify resources. The resource that an IRI identifies is > > the same as the one identified by the URI obtained after > > converting the IRI according to the procedure defined here. > > This means that there is no need to define the association > > between identifier and resource again on the IRI level. > >]]] > > > >This seems to suggest that we should do the mapping before the > model theory; > >which is in tension with the usual refusal to normalize URIs for scheme > >case, hostname case, port number, missing default path, or anything else, > >except as part of actually executing the protocol. Martin: > Clearing up escape issues is one step before casing issues. > Most escape issues (for a-zA-Z0-9, everything outside US-ASCII, > plus a few specials) are completely independent of the scheme, > they apply to all URIs. Case and the other stuff is very much > scheme-dependent. This is a big difference. > Jeremy: > >It is potentially self-inconsistent with the phrase: > > > >[[[ > >However, this mapping SHOULD only be applied when necessary, as late > >as possible. > >]]] Martin: > No, it is not. For RDF, it would just mean that when you compare, > you may want to apply it, but you wouldn't convert and stay there; > you would keep the original. >
Received on Tuesday, 16 April 2002 10:35:12 UTC