Re: Comments on draft-saintandre-xmpp-uri-04.txt from Martin Duerst on 2004-08-24 (uri@w3.org from August 2004)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 24 Aug 2004 09:55:12 +0900
To: Peter Saint-Andre <stpeter@jabber.org>
Cc: xmppwg@jabber.org, uri@w3.org, uri-review@ietf.org
Message-Id: <4.2.0.58.J.20040824090331.0565c778@localhost>
Hello Pierre,

[I agree in all the cases that I have removed from this mail.]

At 17:49 04/08/23 -0500, Peter Saint-Andre wrote:
>On Thu, Aug 19, 2004 at 05:10:46PM +0900, Martin Duerst wrote:
>
> > Hello Pierre,
> >
> > At the recent IETF, you asked me for (I18N) comments on
> > http://www.ietf.org/internet-drafts/draft-saintandre-xmpp-uri-04.txt.
> > Here they are, including quite some comments of a more general nature.
> >
> > I have copied both the xmppwg list (given in the draft) and the uri
> > list. I'm not subscribed to the xmppwg list, so please cc me.
>
>I have also cc'd the uri-review@ietf.org list, since my understanding is
>that both the W3 and IETF lists need to review proposed URI schemes.

Two small points:
- One list is hosted by the W3C, the other by the IETF. They are
   both IETF lists. The one hosted by W3C is the former URL WG list,
   which is still used for work on URIs in general. The one hosted
   by IETF is a separate list for review of URI schemes.
- At the recent URI bof in San Diego, I think I heard Ted Hardie say
   that he would prefer having all the discussion on the W3C-hosted list,
   but I'm not sure I remember correctly, and I'm not sure when
   he intended that to start.


>Those new to this thread may want to reference this I-D:
>
>http://www.ietf.org/internet-drafts/draft-saintandre-xmpp-uri-04.txt

> > >2.1  Rationale
> > >
> > >
> > >   Many types of application can be built using XMPP.  The best-known
> > >   such application is instant messaging (IM) and presence (as these are
> > >   described in [IMP-MODEL] and [IMP-REQS] and defined for XMPP in
> > >   [XMPP-IM]).  Therefore it might seem appropriate
> >
> > This is worded as if it actually isn't appropriate. But my understanding
> > is that you want to say is something like:
> >
> > - It is appropriate to used the im: and pres: schemes for instant messaging
> >   and presence.
> > - There are other uses of XMPP, and the xmpp: scheme is for these other
> > uses.
>
>Yes, this text is long-winded. I've shortened and simplified it in my
>working copy.

Great. Looking forward to seeing it.


> > >not to prepend XMPP addresses
> > >   with the xmpp: URI scheme when addressing XML stanzas
> >
> > The document sometimes talks about resources, and then sometimes about
> > stanzas. In an XMPP context, are they the same? Either streamline the
> > language (use 'resource' only), or explicitly say they are the same,
> > or be clearer about the difference.
>
>In XMPP:
>
>- a "resource identifier" is one portion of an XMPP address (JID)
>- a "stanza" is a well-formed XML fragment sent over an XML stream
>
>Neither of these is a "resource" in the sense used in the URI
>specification. We usually use the term "entity" for the thing that
>a URI identifies.
>
>Does this need to be clearer in the text?

Thanks for pointing out that I was confused myself a bit, too.
I think it would be very good to point out that entities in XMPP
are called stanzas. Also, there is some place in the text where
it suggests that a stanza is sent as part of a request, as opposed
to being returned as a response. This is different from the 'basic'
resolution model of e.g. FTP and HTTP, so it would be good to clarify
what exactly happens.


> > >2.2  Form
> > >
> > >
> > >   The syntax for an xmpp: URI is as follows (where the jid rule is
> > >   defined in [XMPP-CORE] and the query rule is defined in [URI]).
> > >
> > >
> > >         "xmpp:" jid [ "?" query ]

> > For "jid", just referring to XMPP-CORE is way not enough, I guess.
> > The ABNF rules given in
> > http://www.ietf.org/internet-drafts/draft-ietf-xmpp-core-24.txt are:
> >
> >       jid             = [ node "@" ] domain [ "/" resource ]
> >       domain          = fqdn / address-literal
> >       fqdn            = (sub-domain 1*("." sub-domain))
> >       sub-domain      = ([IDNA] conformant domain label)
> >       address-literal = IPv4address / IPv6address
> >
> > For 'node' and 'resource', a stringprep profile is mentioned, which
> > seems to imply that these pieces can contain non-ASCII characters.
> > This would be in conflict with the general URI syntax.
>
>Although in general XMPP addresses are not presented as URIs, it is true
>that when a JID is presented as a URI the non-ASCII characters would
>need to be properly escaped.

This is not a 'would'.


> > For 'sub-domain', it mentions "a domain label as described in [IDNA]",
> > which would mean that it can not only be US-ASCII, but also binary
> > data. If that's what it should be, this would again be in conflict
> > with the general URI syntax. But my guess is that this is not what
> > is intended here; it's probably the case that 'fqnd' is supposed to
> > be what [IDNA] calls a "internationalized domain name", and sub-domain
> > is supposed to be what IDNA calls an *internationalized* domain label.
>
>Yes, by "[IDNA] conformant domain label" was meant "internationalized
>domain label". If this is not clear I will need to inform the RFC
>Editor, who is currently preparing draft-ietf-xmpp-core-24 for
>publication.

Please re-check [IDNA] carefully. [IDNA] does quite a thorough job
to label each of the different concepts clearly. And there are quite
a few different concepts. The word 'conformant' does not appear
in [IDNA], nor does 'conform', so it's not clear which of the
concepts in [IDNA] you mean.


> > This again would be in conflict with the general URI syntax.
>
>Certainly.
>
> > To make these cases work with URIs, what you have to do is to rework
> > the syntax rules so that where necessary (node, sub-domain, resource),
> > they include 'pct-encoded' from
> > http://www.ietf.org/internet-drafts/draft-fielding-uri-rfc2396bis-06.txt.
> > You then also have to be specific about what US-ASCII characters are
> > actually allowed directly (e.g. node doesn't contain '@', and so on),
> > based on the various examples in rfc2396bis and your stringprep
> > profiles.
>
>We need to specify this in the ABNF found in draft-ietf-xmpp-core-24?

In as much as the ABNF is describing the characters (including non-ASCII)
as they are used directly in XMPP (which as I understand is in XML, in UTF-8),
then there is no such need, indeed it would be wrong and confusing to do so.
But this has to be done wherever that syntax is used in some URI scheme.
This applies to the XMPP scheme, but may also apply to other schemes.
Does draft-ietf-xmpp-core-24 in any way talk about URI schemes?
draft-saintandre-xmpp-uri-04.txt says there are other URI schemes
(im: and pres:) that can be used with XMPP, so if e.g. they are
described in draft-ietf-xmpp-core-24, we have to look at it.


>That draft is silent on XMPP addresses as URIs, which is why we are
>working on draft-saintandre-xmpp-uri. I would much prefer not to change
>draft-ietf-xmpp-core-24 at this point if possible, but of course we can
>make changes during Author's 48 Hours if absolutely necessary. Is it
>possible to fully represent stringprep in ABNF?

Some of it may be easy. Some of it may be tedious, but straightforward.
Some of is tedious and difficult. But this is not necessary.
Just write a rule that includes percent-encoding, and then in the
prose specify additional restrictions.


>To this point we have
>treated the stringprep profiles as canonical. Percent-encoding of
>various characters in an XMPP address has meaning within the context of
>an XMPP URI, but as far as I can see has no meaning in the context of a
>normal XMPP address. It seems that we have two options:
>
>1. Change the ABNF in draft-ietf-xmpp-core so that XMPP URIs conform to
>    that syntax; however, that ABNF does not describe a URI.
>
>2. Specify the ABNF for an XMPP URI in draft-saintandre-xmpp-uri, which
>    ABNF will be different in some respects from the ABNF specified in
>    draft-ietf-xmpp-core.

It seems very clear to me that the second option is what you have to go for.


> > One more thing: it seems that in your case, there are no [] delimiters
> > for IPv6address, and there is no mechanism for future ip protocol versions.
> > You may want to have a look at rfc2396bis to either bring this closer
> > to rfc2396bis or to explicitly mention that there is a difference.
>
>Are you suggesting that we replace:
>
>        domain          = fqdn / address-literal
>
>with:
>
>        domain          = fqdn / host

No, it would be
         domain          = fqdn / IP-literal / IPv4address
(or alternatively
         domain          = host)

>... where the host rule is specified in rfc2396bis?

Please note that this is just a suggestion to think about,
not something that I think needs to be fixed.


>If so, does this
>apply to all XMPP addresses or only XMPP URIs? (I think the former, but
>I want to make sure.)

Yes, if this change is made, it would apply to XMPP addresses in general.
Otherwise, it wouldn't make sense.


>Is the IPv6address rule specified in rfc2396bis intended to supersede
>the IPv6address rule specified in RFC2373?

In my recollection, there was a problem with the syntax in RFC2373,
so that's why rfc2396bis is different.
There is some discussion at
http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html#changes,
which also reveals that RFC 3513 obsoleted RFC 2373, so you can't
cite the latter anymore, I guess.



> > >   While the "?" character is allowed in the resource identifier portion
> > >   of an XMPP address (according to [XMPP-CORE]), that character can be
> > >   used as a delimiter between the jid and the query parts of an xmpp:
> > >   URI; therefore, any instances of the "?" character in the resource
> > >   identifier portion of an XMPP address that is generated or processed
> > >   as an xmpp: URI MUST be escaped as "%3F" (as described in Section
> > >   2.2.5 of [URL-GUIDE]).
> >
> > That's a good example of a case that should be expressed in the syntax
> > rules. The syntax has to express the exact form of the URI as it is
> > allowed to appear, not some intention about the parts that go into the
> > URI. So as an example, if the syntax for the resource identifier in
> > xmpp were something simple like
> >      resource = *( 'a' / 'b' / 'c' / '?' )
> > then the syntax for the resource component in the xmpp scheme syntax
> > would have to read:
> >      resource = *( 'a' / 'b' / 'c' / '%3F' / '%3f' )
> > or so.
>
>Again, it seems to me that this applies to XMPP URIs, not XMPP addresses
>in general.

Yes, very much so.


> > >2.3  Generation of XMPP URIs
> > >
> > >
> > >   When generating an XMPP URI, the generating application SHOULD follow
> > >   these steps:
> >
> > I'm probably repeating myself, but having these steps is in clear
> > conflict with the
> >     "xmpp:" jid [ "?" query ]
> > syntax rule above. Either there is an actual jid in there, or it's
> > something else that is the result of some processing. No shortcuts
> > allowed :-(.
>
>That text was discussed on the XMPP WG list in order to assist
>implementers. But I don't understand your comment; the intent is that,
>yes, there is an "actual JID" in there. Are you saying that Step 1
>("Obtain XMPP address (JID).") is not right because a real JID would
>already have passed Steps 2 and 3?

No. What I'm saying is that section 2.3 is basically right, but that
the syntax of the xmpp URI has to be changed to include percent-encoding,
in order to correctly reflect the result of the process described in 2.3.


> > >Saint-Andre            Expires February 11, 2005                [Page 4]
> > >Internet-Draft                  XMPP URI                     August 2004
> > >
> > >
> > >
> > >   1.  Obtain XMPP address (JID).
> > >   2.  Perform [IDNA] translation against the JID (in the form of a
> > >       UTF-8 string).
> >
> > What IDNA translation?
>
>I think that people on the XMPP WG list meant "conversion" rather than
>"translation".

Way not clear enough. IDNA describes various 'conversions' or 'translations',
or whatever you call it, in different directions, and these also have
some flags that affect what actually happens.


> > The word 'translation' appears only once in
> > IDNA, in the copyright section. IDNA defines at least the ToASCII
> > operation and the ToUnicode operation, and it's not clear which
> > one you mean.
>
>When generating an XMPP URI, you would convert using ToASCII (when
>presenting an XMPP URI to a user, you would convert with ToUnicode).
>
>Is the following text more accurate?
>
>    2. Convert the JID from a UTF-8 string to US-ASCII using the ToASCII
>    conversion described in [IDNA].

Much better. What's still missing is what flags to use.

But given that the URI syntax now allows percent-encoding in
domain names, and that xmpp wouldn't be affected even if the URI syntax
didn't allow this, I think conversion to punycode is a really bad
idea. Maybe what you want is to do ToASCII followed by ToUnicode,
or you just want to use nameprep, and be done with it.


> > And is this operation supposed to be applied to
> > the whole JID, including node and resource?
>
>Yes.

You have to check [IDNA]. It may be that the ToASCII operation is defined
only on labels, and thus you have to say explicitly what the labels
are for 'node' and 'resource'.


> > Also, UTF-8 is only given in examples in IDNA, and it's not clear
> > what it is supposed to do here.
>
>A JID is a UTF-8 string. More precisely, a JID is a string of Unicode
>characters that meets a certain set of stringprep profiles, following a
>certain set of syntax rules, encoded as UTF-8.

The jid ABNF doesn't define it that way, or does it? A jid is definitely
in UTF-8 if it appears in the XMPP protocol, because everything there
is in UTF-8. But a jid could appear on paper, or could be stored in
an application e.g. written in Java. In the later case, the chance
that is were in UTF-16 would be high.


> > >   3.  Verify that the UTF-8 string conforms to the format defined in
> > >       [XMPP-CORE], including all appropriate [STRINGPREP] profiles.
> >
> > Since when do stringprep profiles apply to UTF-8? Aren't they much
> > more general? What if somebody wants to implement these operations
> > in UTF-16?
>
>UTF-16 is not allowed by XMPP Core.

UTF-16 isn't allowed on the wire. The conversion we are talking about
isn't done on the wire.
[Also note that if you really have applied ToASCII in step 2, talking
about UTF-8 in step 3, and percent-encoding in step 4, doesn't make sense.
But it's of course the ToASCII that doesn't make sense, not the latter
two steps.]


> > >   4.  Convert any bytes that are not US-ASCII (see [ASCII]) to %hexhex
> > >       format as described in Section 2.2.5 of [URL-GUIDE].
> >
> > I think it would be very helpful to be a bit more specific here
> > and to mention UTF-8. In this step, it's actually relevant.
>
>OK, I will clean that up.
>
> > >   5.  Prepend the 'xmpp:' scheme.
> > >   6.  Append the query component, if any.

By the way, is the encoding of the query component specified?
It definitely should be specified.

> > >2.4  Processing of XMPP URIs
> > >
> > >
> > >   When processing an XMPP URI, the processing application SHOULD follow
> > >   these steps:
> > >
> > >
> > >   1.  Obtain URI.
> > >   2.  Convert any parts in %hexhex format to UTF-8 as described in
> > >       Section 2.2.5 of [URL-GUIDE].
> > >   3.  Verify that the UTF-8 string conforms to the format defined in
> > >       [XMPP-CORE], including all appropriate [STRINGPREP] profiles.
> > >   4.  Perform [IDNA] translation against the UTF-8 string.
> > >   5.  Extract the XMPP address by removing the 'xmpp:' scheme and the
> > >       query component (if any).
> >
> > Very similar comments to above apply here.
>
>Ditto.
>
> > >   At this point, the processing application would either (1) complete
> > >   further XMPP handling itself or (2) invoke a helper application to
> > >   complete XMPP handling; such XMPP handling would most likely consist
> > >   of the following steps:
> > >
> > >
> > >   1.  Authenticating with an appropriate XMPP server (e.g., a server
> > >       that a user has configured as his or her registered service
> > >       provider) if not already authenticated.
> > >   2.  Optionally determining the nature of the intended recipient
> > >       (e.g., via [DISCO]).
> > >   3.  Optionally presenting an appropriate interface to a user based on
> > >       the nature of the intended recipient and/or the contents of the
> > >       query component (however, if the application does not understand
> > >       the query component, it MUST ignore the query component and treat
> > >       the URI as consisting of "xmpp:jid" rather than
> > >       "xmpp:jid?query").
> >
> > What's the secret recipie of understanding the query component?
> > I assume that 'application' means 'client application', because the
> > interface is presented by the client. But clients are not supposed
> > to understand the query component.
>
>What's the secret recipe for HTTP clients to understand the query
>component of an HTTP URI?
>
>In earlier versions of this memo, we specified several allowable query
>components, such as xmpp:user@host?message (where the "message" query
>component would provide a hint about what kind of interface might be
>presented to a user, for instance). However, it seemed undly limiting to
>say that only two or three query components are allowable since we don't
>know what kinds of things people might want to do with XMPP URIs in the
>future (just as it might have been limiting to do so for HTTP some years
>ago). So we've left these optional for now, although they may be
>specified in more detail in the future.

I see. One thing you definitely should improve on HTTP is to say that
the encoding of the query component is UTF-8. This will make everybody's
life much easier.



> > >   4.  Generating an XMPP stanza that translates any user or application
> > >       inputs into their corresponding XMPP equivalents.
> > >   5.  Sending the XMPP stanza via the authenticated server connection
> > >       for delivery to the intended recipient.
> >
> > At this point, I realized that there isn't a single example in this
> > document. Some examples of xmpp scheme URIs would definitely help,
> > both US-ASCII only and others that included non-ASCII text via
> > %HH (I can help).
>
>OK, I can add some examples. I will write some up and post them to these
>lists.
> > Also, examples of the above steps would be great.
>
>Agreed.

Great.


Regards,    Martin.
Received on Tuesday, 24 August 2004 00:57:57 UTC