- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 17 Sep 2009 13:39:02 +0200
- To: Ian Hickson <ian@hixie.ch>
- CC: URI <uri@w3.org>, hybi@ietf.org, uri-review@ietf.org, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Ian Hickson wrote: > ... >> Then the encoding considerations should be something like: >> >> Because many characters are not permitted with this syntax, the >> "heir-part" and "query" elements may contain characters from the >> Unicode Character Set [UCS] as suggested by URI [RFC3986] using the >> reg-name and percent-encoding translations of IRI to URI >> mapping [RFC3937]. Translation is performed by first encoding those >> Unicode characters as octets to the UTF-8 character >> encoding [RFC3629]. Replace the reg-name part of the heir-part by >> the part converted using the ToASCII operation specified in section >> 4.1 of [RFC3490] on each dot-separated label, and by using U+002E >> (FULL STOP) as a label separator, with the flag UseSTD3ASCIIRules >> set to TRUE, and with the flag AllowUnassigned set to TRUE. Then >> only those octets that do not correspond to characters in the >> unreserved set should be percent-encoded. >> >> By using UTF-8 encoding, there are no known compatibility issues >> with mapping Internationlized Resource Identifiers to websocket >> URIs according to [RFC3987]. > > I've used the above as a guide for what to put in the spec. I didn't use > it literally because it seemed to misuse RFC2119 terminology, and it > wasn't clear to me where the descriptive ended and the normative started. > I hope the text now in the spec makes sense. Let me know if it needs more > work. > ... It now says: > Encoding considerations. > Characters in the host component that are excluded by the syntax > defined above must be converted from Unicode to ASCII by applying > the IDNA ToASCII algorithm to the Unicode host name, with both the > AllowUnassigned and UseSTD3ASCIIRules flags set, and using the > result of this algorithm as the host in the URI. > > Characters in other components that are excluded by the syntax > defined above must be converted from Unicode to ASCII by first > encoding the characters as UTF-8 and then replacing the > corresponding bytes using their percent-encoded form as defined in > the URI and IRI specification. [RFC3986] [RFC3987] I think that's good, except that the mention of IRI in the last sentence seems to be superfluous. RFC3986 already defines everything that is needed here. Or is there something specific from the IRI spec you think is relevant? (In which case it should state that more clearly). BR, Julian
Received on Thursday, 17 September 2009 11:40:05 UTC