- From: Joseph A Holsten <joseph@josephholsten.com>
- Date: Fri, 4 Sep 2009 18:51:42 -0500
- To: Ian Hickson <ian@hixie.ch>
- Cc: URI <uri@w3.org>, "hybi@ietf.org" <hybi@ietf.org>, "uri-review@ietf.org" <uri-review@ietf.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
On Sep 4, 2009, at 3:19 PM, Ian Hickson wrote: > On Fri, 4 Sep 2009, Julian Reschke wrote: >> Ian Hickson wrote: >>>> >>>> Because that's how URI and thus URLs are defined. >>> >>> The ws: and wss: URLs are IRIs; why would we limit them to URIs? I'm >>> not especially interested in ASCII-only URIs at this point. These >>> URLs >>> are only ever going to be used in contexts that accept full IRIs. >> >> But that's not who registering an URI scheme works. Check the >> relevant >> RFCs. Essentially you register the *URI* scheme, and get IRIs based >> on >> the mapping rules defined in RFC 3987. > > That's what I thought, but then I got feedback saying I had to > register an > IRI scheme if I wanted to use IRIs. > > I've no interest in making ws: and wss: URIs. Only IRIs. > > If I define the syntax to be a subset of the full URI syntax, how > does it > ever get extended to be a subset of the full IRI syntax? > > What should I put in the spec to make you happy and to make the use > of ws: > and wss: IRIs fully well-defined? The only scheme I can think of that was defined as an IRI was XMPP [RFC4622]. It actually makes more sense when you start with IRIs. If that's what you need, please just do that. Traditionally, every other scheme defined since RFC3987 has defined itself as a URI and defined the exact encoding considerations to handle reserved characters that may occur given the semantics of a particular part. You have very standard semantics: userinfo, host, port, path segments, query. Those that might meaningfully contain reserved characters are userinfo, reg-name segments, and query. reg- name parts get ToASCII, everything else gets mapped with percent- encoding. You actually have to say this because it's not obvious. There's more than one way to do it. >>>>>> I've deferred to RFC3987 to sidestep this issue. >>>>> A URI is not a IRI. >>>>> >>>>> You can refer to the mapping, but that really needs a few more >>>>> words >>>>> than "See RFC3987.". >>>> It may not need many more words, but certainly a few more words. >>> >>> Could you elaborate? Which words should I add? >> >> You need to state how you want to encode non-ASCII characters. "See >> RFC3987" >> goes into the right direction but really isn't sufficient. Please >> see RFC >> 4395, Section 2.6: >> >> "2.6. Internationalization and Character Encoding >> >> When describing URI schemes in which (some of) the elements of the >> URI are actually representations of human-readable text, care >> should >> be taken not to introduce unnecessary variety in the ways in which >> characters are encoded into octets and then into URI characters; >> see >> RFC 3987 [6] and Section 2.5 of RFC 3986 [5] for guidelines. If >> URIs >> of a scheme contain any text fields, the scheme definition MUST >> describe the ways in which characters are encoded, and any >> compatibility issues with IRIs of the scheme." > > I've read this, but as far as I can tell, "Always UTF-8" and "See > IRI" are > both complete and accurate ways of addressing this. No. There's at least two ways to encode reg-names, tons of UCS encoding issues, and more. Pedantic, but that's the point of spec review, no? > Since apparently neither of these options satisfies you, could you > state > exactly what literal text would satisfy you? If you're going to define it as URI and handle IRIs by mapping, I believe my text[1] should satisfy. 1: http://lists.w3.org/Archives/Public/uri/2009Sep/0001.html Joseph Holsten
Received on Friday, 4 September 2009 23:52:28 UTC