RE: charmod-uri from Jeremy Carroll on 2002-04-16 (w3c-rdfcore-wg@w3.org from April 2002)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Tue, 16 Apr 2002 15:33:53 +0100
To: "Martin Duerst" <duerst@w3.org>, <w3c-rdfcore-wg@w3.org>
Cc: <w3c-i18n-ig@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDIEKLCDAA.jjc@hplb.hpl.hp.com>
For RDF Core, a significant part of Martin's comments is the last paragraph
below.
In RDF terms I read it as advocating that the labels of the nodes in the RDF
graph are US-ASCII URIs not IRIs (although implementations should maintain
the original character sequence).

So far I've heard:
- Jeremy, maybe Dan, maybe Larry, in favour of using "IRIs" (at least
original character sequences) as the labels on the RDF graph
- Martin, somehow Aaron as in favour of using US-ASCII URIs as the labels on
the RDF graph.

This is characerised by test 003 in

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Apr/att-0116/01-charm
od-uri.htm

Denying test003 is agreeing that labels are US ASCII URIs.

Equivalently that is that

http://example.org/#Andr%C3%A9

and

http://example.org/#Andre

are completely equivalent and interchangeable. The only observable
difference in behaviour that may be expected is the exact form used by a
viewer or output device.

I am happy to go either way, although I find it difficult to see how to
state the normal form constraint that I think is important.

Consider the US ASCII URI

http://example.org/#Andre%CC%81

where %CC%81 is the UTF-8 encoding of character #x301 the combining acute
accent.

This is, like any US ASCII URI, in Normal Form C, despite the UTF-8 original
character sequence not being in NFC. I don't think I can support or propose
that we prohibit those US ASCII URIs that when viewed as a UTF-8 encoded
original characeter sequence correspond to original character sequences that
are not in NFC. I think a "NOTE: " somewhere about this would be close to
incomprehensible. Going down this path, in my view, limits us to saying
non-normative things about RDF platforms not using OCS's which are not NFC,
but using the corresponding US ASCII URI instead. I could write a
non-normative appendix to our syntax spec that said all this. We could
modify some of the fraud examples to show legal test cases that used the
US-ASCII form of the problematic IRIs.

Jeremy





Jeremy:
> >[[[
> >2.3 Mapping of IRIs to URIs
> >
> >This section defines how to map an IRI to a URI. Everything in
> >this section applies also to IRI references and URI references, as
> >well as components thereoff (e.g. fragment identifiers).
> >
> >This mapping has two purposes:
> >
> >   a) Syntactical: Many URI schemes and components define additional
> >      syntactical restrictions not captured in Section 2.2. Such
> >      restrictions can be applied to IRIs by noting that IRIs are only
> >      valid if they map to syntactically valid URIs. This means that
> >      such syntactical restrictions do not have to be defined again
> >      on the IRI level.
> >
> >   b) Interpretational: URIs identify resources in various ways. IRIs
> >      also indentify resources. The resource that an IRI identifies is
> >      the same as the one identified by the URI obtained after
> >      converting the IRI according to the procedure defined here.
> >      This means that there is no need to define the association
> >      between identifier and resource again on the IRI level.
> >]]]
> >
> >This seems to suggest that we should do the mapping before the
> model theory;
> >which is in tension with the usual refusal to normalize URIs for scheme
> >case, hostname case, port number, missing default path, or anything else,
> >except as part of actually executing the protocol.

Martin:
> Clearing up escape issues is one step before casing issues.
> Most escape issues (for a-zA-Z0-9, everything outside US-ASCII,
> plus a few specials) are completely independent of the scheme,
> they apply to all URIs. Case and the other stuff is very much
> scheme-dependent. This is a big difference.
>

Jeremy:
> >It is potentially self-inconsistent with the phrase:
> >
> >[[[
> >However, this mapping SHOULD only be applied when necessary, as late
> >as possible.
> >]]]

Martin:
> No, it is not. For RDF, it would just mean that when you compare,
> you may want to apply it, but you wouldn't convert and stay there;
> you would keep the original.
>
Received on Tuesday, 16 April 2002 10:35:12 UTC