Re: ws: and wss: schemes

On Fri, 4 Sep 2009, Joseph A Holsten wrote:
> On Sep 4, 2009, at 12:33 AM, Ian Hickson wrote:
> > On Fri, 14 Aug 2009, Julian Reschke wrote:
> > > > 
> > > > [...] it now says:
> > > >
> > > >   URI scheme syntax.
> > > >      In ABNF terms using the terminals from the IRI specifications:
> > > >      [RFC5238] [RFC3987]
> > > > 
> > > >           "ws" ":" ihier-part [ "?" iquery ]
> > > 
> > > That is even worse than before, because it now uses productions from 
> > > the IRI spec defining *URI* syntax.
> > 
> > ws: and wss: URLs are i18n-aware; why would we want to limit them to 
> > ASCII?
> URIs are not i18n-aware, you're thinking of IRIs. But since there is a 
> standard mapping for IRIs, it's pretty clear what you actually want. The 
> *URI* syntax should be:
>   "ws" ":" heir-part [ "?" query ]

Ok, done.

> Then the encoding considerations should be something like:
>   Because many characters are not permitted with this syntax, the
>   "heir-part" and "query" elements may contain characters from the
>   Unicode Character Set [UCS] as suggested by URI [RFC3986] using the
>   reg-name and percent-encoding translations of IRI to URI
>   mapping [RFC3937]. Translation is performed by first encoding those
>   Unicode characters as octets to the UTF-8 character
>   encoding [RFC3629]. Replace the reg-name part of the heir-part by
>   the part converted using the ToASCII operation specified in section
>   4.1 of [RFC3490] on each dot-separated label, and by using U+002E
>   (FULL STOP) as a label separator, with the flag UseSTD3ASCIIRules
>   set to TRUE, and with the flag AllowUnassigned set to TRUE. Then
>   only those octets that do not correspond to characters in the
>   unreserved set should be percent-encoded.
>   By using UTF-8 encoding, there are no known compatibility issues
>   with mapping Internationlized Resource Identifiers to websocket
>   URIs according to [RFC3987].

I've used the above as a guide for what to put in the spec. I didn't use 
it literally because it seemed to misuse RFC2119 terminology, and it 
wasn't clear to me where the descriptive ended and the normative started. 
I hope the text now in the spec makes sense. Let me know if it needs more 

> > > Furthermore, it still doesn't answer what the semantics of these parts
> > > are. What do "ihier-part" and "iquery" represent in a ws URI?
> > 
> > This is defined by the RFC 3987, no? Surely we wouldn't want IRI
> > components to have different meanings in different schemes?
> > 
> > > What's the effect? How are they used?
> > 
> > This is defined earlier in the Web Socket specification.
> Section 3.1 Parsing Web Socket URLs seems to make the semantics pretty 
> clear to me. How about adding "See Section 3.1" to URI scheme semantics 
> portions of the IANA Considerations sections? Would that be sufficient?

I don't think section 3.1 really adds anything more than what the 
registration at this point says (that the path and query form the resource 
name, and the other components are as defined in the URI spec).

On Fri, 4 Sep 2009, Joseph A Holsten wrote:
> Traditionally, every other scheme defined since RFC3987 has defined 
> itself as a URI and defined the exact encoding considerations to handle 
> reserved characters that may occur given the semantics of a particular 
> part. You have very standard semantics: userinfo, host, port, path 
> segments, query. Those that might meaningfully contain reserved 
> characters are userinfo, reg-name segments, and query. reg-name parts 
> get ToASCII, everything else gets mapped with percent-encoding. You 
> actually have to say this because it's not obvious. There's more than 
> one way to do it.

We really should fix these specs so that there isn't more than one way to 
do it for future schemes.

On Sat, 5 Sep 2009, Toby Inkster wrote:
> As I understand it, if you, Ian Hickson, own, then you 
> are the authority for deciding what resources are represented by URIs 
> starting with:

I think it would be pretty ridiculous for anyone to claim that the URL 
above represents anything but a host (""), port (80), and 
path (/) that can be used over HTTP (with whatever method makes sense 
given the context in which the URL is found -- e.g. GET if the URL was on 
the side of a bus, POST if it is was in <form method=post action="...">). 
For example, claiming that it represents a person or a book or something 
like that. (Of course, that hasn't stopped the Semantic Web community from 
doing exactly that.)

I certainly wouldn't feel comfortable saying that that URI represented 
anything to do with the Web Sockets protocol.

> In particular, you in your role as authority are free to decree that 
> this:
> Represents a Web Sockets path of "/foo" running on port 81 of the host 

That seems like a terrible idea. Surely it represents a Web page/resource/ 
service/whatever you want to call it on "" port 80 path 
"/". Treating it as other things seems like a terrible 
layering and orthogonality violation.

On Mon, 7 Sep 2009, Julian Reschke wrote:
> What I still miss is a reference from the URI registration template to 
> the section which defines the syntax (*)

The registration template _is_ the section that defines the syntax.

> and in that section, a statement about what the resource name exactly is 
> good for. (It's definitively not obvious by just reading the parsing 
> algorithm).

What the resource name is good for is entirely up to the server. The spec 
doesn't assign it any particular semantic meaning or purpose beyond what 
the registration text says. So I don't know what more I could say.

> (*) I think that section would be much more readable when it used ABNF 
> as everybody else does.

Assuming you mean the section that says how to parse the URLs, then the 
only part of it that could conceivably use ABNF is the part defined in 
[WebAddresses], so I don't know what it would mean to use ABNF here.

> I hear that by specifying an algorithm you want to exclude certain 
> standard things like fragments, and include error handling; but I think 
> ABNF + prose would be much easier to understand.

Please send such feedback to Larry; I am no longer editing those 

> Furthermore, fragment identifiers are orthogonal to the URI scheme, see 
> <>:
> "Fragment identifier semantics are independent of the URI scheme and 
> thus cannot be redefined by scheme specifications."

I've no idea to what you are referring here. Where are fragment 
identifiers even mentioned in the Web Socket protocol spec?

Ian Hickson               U+1047E                )\._.,--....,'``.    fL       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 17 September 2009 09:19:22 UTC