- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Wed, 01 Apr 2009 10:52:15 +0200
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
- CC: Lisa Dusseault <lisa.dusseault@gmail.com>, public-iri@w3.org, Mark Baker <mark@coactus.com>, apps-discuss@ietf.org
Martin J. Dürst wrote:
> I agree with Julian that discussion about updates of RFC 3987 (IRI spec,
> new version currently at draft-duerst-iri-bis-05) should go to
> public-iri@w3.org, whereas bug reports and discussions about RFC 3986
> (URI spec) should go to uri@w3.org.
>
> But Lisa, you are the AD who eventually has to handle anything that
> comes out of these drafts, so I'm glad to follow your directions.
>
> Julian, could you be more specific (pointer is sufficient) about what
> you mean below with "potential issue (normalization)"?
>
> Regards, Martin.
It's the one raised by Anne vK: Section 3.1 requires different IRI->URI
behavior depending on the encoding of the source document:
Step 1. Generate a UCS character sequence from the original IRI
format. This step has the following three variants,
depending on the form of the input:
a. If the IRI is written on paper, read aloud, or otherwise
represented as a sequence of characters independent of
any character encoding, represent the IRI as a sequence
of characters from the UCS normalized according to
Normalization Form C (NFC, [UTR15]).
b. If the IRI is in some digital representation (e.g., an
octet stream) in some known non-Unicode character
encoding, convert the IRI to a sequence of characters
from the UCS normalized according to NFC.
c. If the IRI is in a Unicode-based character encoding (for
example, UTF-8 or UTF-16), do not normalize (see section
5.3.2.2 for details). Apply step 2 directly to the
encoded Unicode character sequence.
It would be helpful to understand where exactly this requirement comes
from, and whether we have evidence it's being implemented (or even
implementable; the source document encoding may not be known at the
moment where the URI->IRI conversion occurs).
BR, Julian
Received on Wednesday, 1 April 2009 08:53:11 UTC