Re: Comments on draft-saintandre-xmpp-uri-04.txt from Martin Duerst on 2004-09-04 (uri@w3.org from September 2004)

From: Martin Duerst <duerst@w3.org>
Date: Sun, 05 Sep 2004 08:52:39 +0900
To: Peter Saint-Andre <stpeter@jabber.org>
Cc: xmppwg@jabber.org, uri@w3.org
Message-Id: <4.2.0.58.J.20040905081530.05844f10@localhost>
Hello Peter,

At 19:20 04/09/02 -0500, Peter Saint-Andre wrote:
>Hi Martin,
>
>My apologies for the delayed reply. I have attempted to incorporate all
>of your feedback and have prepared an interim version of the document,
>which is available here:
>
>http://www.jabber.org/~stpeter/ietf/draft-saintandre-xmpp-uri-05-pre1.txt
>
>I would appreciate it if you or others would take a look at this document
>and let me know if I have misinterpreted any of the feedback received.

[I have to admit that I'm notoriously bad at looking at a document
two times in a row. So I may have missed something.]

I have just looked at that document, and it looks quite good.
Here are a few details that caught my eye:

- You still seem to rely a lot on the general statement that an
   XMPP address is UTF-8. Then in the actual conversion, you often
   mention only percent-encoding. I'm affraid that some implementers
   that for some reason e.g. internally use UTF-16 may get things
   wrong. This in particular applies to 3.3, Character encoding
   considerations.

- Related to this, and in many locations, you say
   "non-US-ASCII characters must be percent-encoded". You have to
   say something like "non-US-ASCII octets must be percent-encoded"
   (after you made clear that they are in UTF-8). The same the
   other way round: replace "by converting each percent-encoded octet
   into the appropriate reserved or non-US-ASCII character" e.g. with
   "by converting each sequence of percent-encoded octets into
   the appropriate sequence of reserved or non-US-ASCII characters
   by first decoding percent-encoded octets into actual octets and
   then interpreting the octets as UTF-8".

- I'm wondering whether it is necessary to separate node, host,
   and path, in the conversion process. Either mention the specific
   normalization (stringprep) requirements for each of the components,
   or simplyfy conversion by just converting the whole thing in one
   go (or tell me why that wouldn't work).

- I'm not happy with the modalities you have used where you describe
   conversion. You write "it is necessary to use consistent
   methods" and then "such methods are described below". Then you
   write "An XMPP "node identifier" *can* be transformed". If this
   is a *can*, are there other ways? Should other ways produce the
   same result? I'm affraid of somebody implementing the spec and
   doing something different and claiming: "well, it said 'can',
   it didn't say that's the only way, so I choose something else".

- I like the examples. It would probably be good to have some
   internationalization in the domain name part, too.

- The title "Method" for 2.3.1 and 2.4.1 is extremely generic.

- In 2.4.1, I think it might help readers to explain that the
   first few steps of "XMPP handling" are similar e.g. to HTTP
   authentication, whereas the later steps are similar to what
   happens e.g. in the case of a 'mailto:' scheme.
   [I hope I got this right; I think when I was asking in my
    previous comments about what's really going on, I was trying
    to find out how the stanza would be composed from data in
    the URI, but that's not what's happening, similar to the
    plain use of mailto:, where the user has to compose the
    message before actually sending the mail.]

- You use the 'host' rule from rfc2396bis. This includes IPv4
   and IPv6 addresses. Are you sure you want these?

>Several further questions and observations below...
>
>On Tue, Aug 24, 2004 at 09:55:12AM +0900, Martin Duerst wrote:

> > >> That's a good example of a case that should be expressed in the syntax
> > >> rules. The syntax has to express the exact form of the URI as it is
> > >> allowed to appear, not some intention about the parts that go into the
> > >> URI. So as an example, if the syntax for the resource identifier in
> > >> xmpp were something simple like
> > >>      resource = *( 'a' / 'b' / 'c' / '?' )
> > >> then the syntax for the resource component in the xmpp scheme syntax
> > >> would have to read:
> > >>      resource = *( 'a' / 'b' / 'c' / '%3F' / '%3f' )
> > >> or so.
>
>On re-reading rfc2396bis, it seems to me that this is handled by the rule
>contained in Section 2.2:
>
>    If data for a URI component would conflict with a reserved
>    character's purpose as a delimiter, then the conflicting data
>    must be percent-encoded before forming the URI.
>
>So rather than describing specifically how to handle the "?" character,
>it seems best to include this general rule (since it might also apply
>to any other character that might conflict with a reserved character).
>Does this seem right?

Yes.


> > No. What I'm saying is that section 2.3 is basically right, but that
> > the syntax of the xmpp URI has to be changed to include percent-encoding,
> > in order to correctly reflect the result of the process described in 2.3.
>
>I've attempted to do so in the preview release referred to at the
>beginning of this message, but in trying to do justice to rfc2396bis
>I may have strayed further from the text of the previous version, so
>corrections are welcome.

Overall, I think this went quite well.


> > >> >   2.  Perform [IDNA] translation against the JID (in the form of a
> > >> >       UTF-8 string).
> > >>
> > >> What IDNA translation?
> > >
> > >I think that people on the XMPP WG list meant "conversion" rather than
> > >"translation".
> >
> > Way not clear enough. IDNA describes various 'conversions' or
> > 'translations',
> > or whatever you call it, in different directions, and these also have
> > some flags that affect what actually happens.
>
>In fact it seems to me that IDNA conversion would refer only to the
>internationalized domain labels that make up an internationalized domain
>name. Certainly it would not apply to the JID as a whole, and the flags
>defined for IDNA are not appropriate for an XMPP node identifier or an
>XMPP resource identifier. So I am not sure that any reference to the
>ToASCII conversion operation is appropriate here, or that it is truly
>necessary to define how the UTF-8-to-ASCII conversion occurs. However,
>if it *is* necessary, then we *could* define the conversion by means of
>the ToASCII operation (though we would need to describe how the various
>parts of a JID can be understood as labels in the IDNA sense), or we
>could define the conversion by other means. How have other URI scheme
>specifications dealt with this matter?

I agree that in any case, this would only apply to the 'domain' part
of a JID. But it looks to me that because we don't convert from Unicode
to punycode, maybe no reference to IDNA is needed at all. It may be that
Jabber uses IDNA/punycode internally to deal with internationalized
domains, but the 'XMPP handler' would be responsible for this, the
URI syntax wouldn't have to care about it.


Regards,     Martin.
Received on Saturday, 4 September 2004 23:54:11 UTC