W3C home > Mailing lists > Public > uri@w3.org > September 2009

Re: [Uri-review] ws: and wss: schemes

From: Joseph A Holsten <joseph@josephholsten.com>
Date: Fri, 4 Sep 2009 02:14:48 -0500
Cc: URI <uri@w3.org>, hybi@ietf.org, uri-review@ietf.org
Message-Id: <6CD25FB7-160A-4EC3-A3CB-0581D6ED3614@josephholsten.com>
To: Ian Hickson <ian@hixie.ch>

On Sep 4, 2009, at 12:33 AM, Ian Hickson wrote:

> On Fri, 14 Aug 2009, Julian Reschke wrote:
>>
>> [...] it now says:
>>
>>>   URI scheme syntax.
>>>      In ABNF terms using the terminals from the IRI specifications:
>>>      [RFC5238] [RFC3987]
>>>
>>>           "ws" ":" ihier-part [ "?" iquery ]
>>
>> That is even worse than before, because it now uses productions  
>> from the
>> IRI spec defining *URI* syntax.
>
> ws: and wss: URLs are i18n-aware; why would we want to limit them to
> ASCII?

URIs are not i18n-aware, you're thinking of IRIs. But since there is a  
standard mapping for IRIs, it's pretty clear what you actually want.  
The *URI* syntax should be:

   "ws" ":" heir-part [ "?" query ]


Then the encoding considerations should be something like:

   Because many characters are not permitted with this syntax, the
   "heir-part" and "query" elements may contain characters from the
   Unicode Character Set [UCS] as suggested by URI [RFC3986] using the
   reg-name and percent-encoding translations of IRI to URI
   mapping [RFC3937]. Translation is performed by first encoding those
   Unicode characters as octets to the UTF-8 character
   encoding [RFC3629]. Replace the reg-name part of the heir-part by
   the part converted using the ToASCII operation specified in section
   4.1 of [RFC3490] on each dot-separated label, and by using U+002E
   (FULL STOP) as a label separator, with the flag UseSTD3ASCIIRules
   set to TRUE, and with the flag AllowUnassigned set to TRUE. Then
   only those octets that do not correspond to characters in the
   unreserved set should be percent-encoded.

   By using UTF-8 encoding, there are no known compatibility issues
   with mapping Internationlized Resource Identifiers to websocket
   URIs according to [RFC3987].

>> Furthermore, it still doesn't answer what the semantics of these  
>> parts
>> are. What do "ihier-part" and "iquery" represent in a ws URI?
>
> This is defined by the RFC 3987, no? Surely we wouldn't want IRI
> components to have different meanings in different schemes?
>
>> What's the effect? How are they used?
>
> This is defined earlier in the Web Socket specification.

Section 3.1 Parsing Web Socket URLs seems to make the semantics pretty  
clear to me. How about adding "See Section 3.1" to URI scheme  
semantics portions of the IANA Considerations sections? Would that be  
sufficient?

Joseph Holsten
Received on Friday, 4 September 2009 07:15:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:42 GMT