Re: [Uri-review] [hybi] ws: and wss: schemes from Joseph A Holsten on 2009-09-04 (uri@w3.org from September 2009)

From: Joseph A Holsten <joseph@josephholsten.com>
Date: Fri, 4 Sep 2009 18:51:42 -0500
To: Ian Hickson <ian@hixie.ch>
Cc: URI <uri@w3.org>, "hybi@ietf.org" <hybi@ietf.org>, "uri-review@ietf.org" <uri-review@ietf.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-Id: <27A077DD-42B6-4ABD-B633-12EB73AA4201@josephholsten.com>

On Sep 4, 2009, at 3:19 PM, Ian Hickson wrote:

> On Fri, 4 Sep 2009, Julian Reschke wrote:
>> Ian Hickson wrote:
>>>>
>>>> Because that's how URI and thus URLs are defined.
>>>
>>> The ws: and wss: URLs are IRIs; why would we limit them to URIs? I'm
>>> not especially interested in ASCII-only URIs at this point. These  
>>> URLs
>>> are only ever going to be used in contexts that accept full IRIs.
>>
>> But that's not who registering an URI scheme works. Check the  
>> relevant
>> RFCs. Essentially you register the *URI* scheme, and get IRIs based  
>> on
>> the mapping rules defined in RFC 3987.
>
> That's what I thought, but then I got feedback saying I had to  
> register an
> IRI scheme if I wanted to use IRIs.
>
> I've no interest in making ws: and wss: URIs. Only IRIs.
>
> If I define the syntax to be a subset of the full URI syntax, how  
> does it
> ever get extended to be a subset of the full IRI syntax?
>
> What should I put in the spec to make you happy and to make the use  
> of ws:
> and wss: IRIs fully well-defined?

The only scheme I can think of that was defined as an IRI was XMPP  
[RFC4622]. It actually makes more sense when you start with IRIs. If  
that's what you need, please just do that.

Traditionally, every other scheme defined since RFC3987 has defined  
itself as a URI and defined the exact encoding considerations to  
handle reserved characters that may occur given the semantics of a  
particular part. You have very standard semantics: userinfo, host,  
port, path segments, query. Those that might meaningfully contain  
reserved characters are userinfo, reg-name segments, and query. reg- 
name parts get ToASCII, everything else gets mapped with percent- 
encoding. You actually have to say this because it's not obvious.  
There's more than one way to do it.


>>>>>> I've deferred to RFC3987 to sidestep this issue.
>>>>> A URI is not a IRI.
>>>>>
>>>>> You can refer to the mapping, but that really needs a few more  
>>>>> words
>>>>> than "See RFC3987.".
>>>> It may not need many more words, but certainly a few more words.
>>>
>>> Could you elaborate? Which words should I add?
>>
>> You need to state how you want to encode non-ASCII characters. "See  
>> RFC3987"
>> goes into the right direction but really isn't sufficient. Please  
>> see RFC
>> 4395, Section 2.6:
>>
>> "2.6. Internationalization and Character Encoding
>>
>>   When describing URI schemes in which (some of) the elements of the
>>   URI are actually representations of human-readable text, care  
>> should
>>   be taken not to introduce unnecessary variety in the ways in which
>>   characters are encoded into octets and then into URI characters;  
>> see
>>   RFC 3987 [6] and Section 2.5 of RFC 3986 [5] for guidelines.  If  
>> URIs
>>   of a scheme contain any text fields, the scheme definition MUST
>>   describe the ways in which characters are encoded, and any
>>   compatibility issues with IRIs of the scheme."
>
> I've read this, but as far as I can tell, "Always UTF-8" and "See  
> IRI" are
> both complete and accurate ways of addressing this.

No. There's at least two ways to encode reg-names, tons of UCS  
encoding issues, and more. Pedantic, but that's the point of spec  
review, no?

> Since apparently neither of these options satisfies you, could you  
> state
> exactly what literal text would satisfy you?

If you're going to define it as URI and handle IRIs by mapping, I  
believe my text[1] should satisfy.

1: http://lists.w3.org/Archives/Public/uri/2009Sep/0001.html

Joseph Holsten

Received on Friday, 4 September 2009 23:52:31 UTC