- From: Joseph A Holsten <joseph@josephholsten.com>
- Date: Fri, 4 Sep 2009 02:14:48 -0500
- To: Ian Hickson <ian@hixie.ch>
- Cc: URI <uri@w3.org>, hybi@ietf.org, uri-review@ietf.org
On Sep 4, 2009, at 12:33 AM, Ian Hickson wrote: > On Fri, 14 Aug 2009, Julian Reschke wrote: >> >> [...] it now says: >> >>> URI scheme syntax. >>> In ABNF terms using the terminals from the IRI specifications: >>> [RFC5238] [RFC3987] >>> >>> "ws" ":" ihier-part [ "?" iquery ] >> >> That is even worse than before, because it now uses productions >> from the >> IRI spec defining *URI* syntax. > > ws: and wss: URLs are i18n-aware; why would we want to limit them to > ASCII? URIs are not i18n-aware, you're thinking of IRIs. But since there is a standard mapping for IRIs, it's pretty clear what you actually want. The *URI* syntax should be: "ws" ":" heir-part [ "?" query ] Then the encoding considerations should be something like: Because many characters are not permitted with this syntax, the "heir-part" and "query" elements may contain characters from the Unicode Character Set [UCS] as suggested by URI [RFC3986] using the reg-name and percent-encoding translations of IRI to URI mapping [RFC3937]. Translation is performed by first encoding those Unicode characters as octets to the UTF-8 character encoding [RFC3629]. Replace the reg-name part of the heir-part by the part converted using the ToASCII operation specified in section 4.1 of [RFC3490] on each dot-separated label, and by using U+002E (FULL STOP) as a label separator, with the flag UseSTD3ASCIIRules set to TRUE, and with the flag AllowUnassigned set to TRUE. Then only those octets that do not correspond to characters in the unreserved set should be percent-encoded. By using UTF-8 encoding, there are no known compatibility issues with mapping Internationlized Resource Identifiers to websocket URIs according to [RFC3987]. >> Furthermore, it still doesn't answer what the semantics of these >> parts >> are. What do "ihier-part" and "iquery" represent in a ws URI? > > This is defined by the RFC 3987, no? Surely we wouldn't want IRI > components to have different meanings in different schemes? > >> What's the effect? How are they used? > > This is defined earlier in the Web Socket specification. Section 3.1 Parsing Web Socket URLs seems to make the semantics pretty clear to me. How about adding "See Section 3.1" to URI scheme semantics portions of the IANA Considerations sections? Would that be sufficient? Joseph Holsten
Received on Friday, 4 September 2009 07:15:34 UTC