Re: Comments on draft-saintandre-xmpp-uri-04.txt

On Thu, Aug 19, 2004 at 05:10:46PM +0900, Martin Duerst wrote:

 > Hello Pierre,
 >
 > At the recent IETF, you asked me for (I18N) comments on
 > http://www.ietf.org/internet-drafts/draft-saintandre-xmpp-uri-04.txt.
 > Here they are, including quite some comments of a more general nature.
 >
 > I have copied both the xmppwg list (given in the draft) and the uri
 > list. I'm not subscribed to the xmppwg list, so please cc me.

I have also cc'd the uri-review@ietf.org list, since my understanding is
that both the W3 and IETF lists need to review proposed URI schemes.
Those new to this thread may want to reference this I-D:

http://www.ietf.org/internet-drafts/draft-saintandre-xmpp-uri-04.txt

(If you reply and you are not on the XMPP WG list, your reply will be
rejected; however I can add you to the list of accepted posters so that
you do not need to subscribe to the list.)

<snip/>

 > >1.  Introduction
 > >
 > >
 > >   The Extensible Messaging and Presence Protocol (XMPP) is a streaming
 > >   XML technology that enables near-real-time communications between any
 > >   two entities on a network.  [XMPP-CORE] specifies that on an XMPP
 > >   network itself, the address of an XMPP entity is not to be prepended
 > >   with a Uniform Resource Identifier (URI) scheme (as defined in RFC
 > >   2396 [URI]).
 >
 > You should change RFC 2396 to
 > http://www.ietf.org/internet-drafts/draft-fielding-uri-rfc2396bis-06.txt,
 > or more exactly the RFC that will result from it (it's currently in
 > IETF Last Call).

Changed in my working copy.

 > >2.  Narrative
 >
 > I was quite confused by this title. Please change to something more
 > specific.

Changed in my working copy to "Description of xmpp: URI Scheme"

 > >2.1  Rationale
 > >
 > >
 > >   Many types of application can be built using XMPP.  The best-known
 > >   such application is instant messaging (IM) and presence (as these are
 > >   described in [IMP-MODEL] and [IMP-REQS] and defined for XMPP in
 > >   [XMPP-IM]).  Therefore it might seem appropriate
 >
 > This is worded as if it actually isn't appropriate. But my understanding
 > is that you want to say is something like:
 >
 > - It is appropriate to used the im: and pres: schemes for instant messaging
 >   and presence.
 > - There are other uses of XMPP, and the xmpp: scheme is for these other
 > uses.

Yes, this text is long-winded. I've shortened and simplified it in my
working copy.

 > >   Note well that on an XMPP network, entities are to be addressed as
 > >   <[node@]domain[/resource]> (i.e., without a URI scheme) rather than
 > >   as <xmpp:[node@]domain[/resource]>.  The xmpp: URI format is provided
 > >   for the sake of non-native interfaces and applications only; native
 > >   applications are strongly encouraged
 >
 > 'strongly encouraged' sound as if there is no real harm (but really
 > no benefit) from using xmpp:[node@]domain[/resource]. Is this actually
 > true? Otherwise, I think it is better to write "native applications MUST
 > not use the xmpp: prefix", to make clear that this is an interoperability
 > requirement.

Yes, this is an interoperability requirement; therefore I have changed
it to MUST NOT in my working copy.

 > >not to prepend XMPP addresses
 > >   with the xmpp: URI scheme when addressing XML stanzas
 >
 > The document sometimes talks about resources, and then sometimes about
 > stanzas. In an XMPP context, are they the same? Either streamline the
 > language (use 'resource' only), or explicitly say they are the same,
 > or be clearer about the difference.

In XMPP:

- a "resource identifier" is one portion of an XMPP address (JID)
- a "stanza" is a well-formed XML fragment sent over an XML stream

Neither of these is a "resource" in the sense used in the URI
specification. We usually use the term "entity" for the thing that
a URI identifies.

Does this need to be clearer in the text?

 > >2.2  Form
 > >
 > >
 > >   The syntax for an xmpp: URI is as follows (where the jid rule is
 > >   defined in [XMPP-CORE] and the query rule is defined in [URI]).
 > >
 > >
 > >         "xmpp:" jid [ "?" query ]
 >
 > The change log says 'removed the query component', but it's still here.

The changelog was in error.

 > For "jid", just referring to XMPP-CORE is way not enough, I guess.
 > The ABNF rules given in
 > http://www.ietf.org/internet-drafts/draft-ietf-xmpp-core-24.txt are:
 >
 >       jid             = [ node "@" ] domain [ "/" resource ]
 >       domain          = fqdn / address-literal
 >       fqdn            = (sub-domain 1*("." sub-domain))
 >       sub-domain      = ([IDNA] conformant domain label)
 >       address-literal = IPv4address / IPv6address
 >
 > For 'node' and 'resource', a stringprep profile is mentioned, which
 > seems to imply that these pieces can contain non-ASCII characters.
 > This would be in conflict with the general URI syntax.

Although in general XMPP addresses are not presented as URIs, it is true
that when a JID is presented as a URI the non-ASCII characters would
need to be properly escaped.

 > For 'sub-domain', it mentions "a domain label as described in [IDNA]",
 > which would mean that it can not only be US-ASCII, but also binary
 > data. If that's what it should be, this would again be in conflict
 > with the general URI syntax. But my guess is that this is not what
 > is intended here; it's probably the case that 'fqnd' is supposed to
 > be what [IDNA] calls a "internationalized domain name", and sub-domain
 > is supposed to be what IDNA calls an *internationalized* domain label.

Yes, by "[IDNA] conformant domain label" was meant "internationalized
domain label". If this is not clear I will need to inform the RFC
Editor, who is currently preparing draft-ietf-xmpp-core-24 for
publication.

 > This again would be in conflict with the general URI syntax.

Certainly.

 > To make these cases work with URIs, what you have to do is to rework
 > the syntax rules so that where necessary (node, sub-domain, resource),
 > they include 'pct-encoded' from
 > http://www.ietf.org/internet-drafts/draft-fielding-uri-rfc2396bis-06.txt.
 > You then also have to be specific about what US-ASCII characters are
 > actually allowed directly (e.g. node doesn't contain '@', and so on),
 > based on the various examples in rfc2396bis and your stringprep
 > profiles.

We need to specify this in the ABNF found in draft-ietf-xmpp-core-24?
That draft is silent on XMPP addresses as URIs, which is why we are
working on draft-saintandre-xmpp-uri. I would much prefer not to change
draft-ietf-xmpp-core-24 at this point if possible, but of course we can
make changes during Author's 48 Hours if absolutely necessary. Is it
possible to fully represent stringprep in ABNF? To this point we have
treated the stringprep profiles as canonical. Percent-encoding of
various characters in an XMPP address has meaning within the context of
an XMPP URI, but as far as I can see has no meaning in the context of a
normal XMPP address. It seems that we have two options:

1. Change the ABNF in draft-ietf-xmpp-core so that XMPP URIs conform to
    that syntax; however, that ABNF does not describe a URI.

2. Specify the ABNF for an XMPP URI in draft-saintandre-xmpp-uri, which
    ABNF will be different in some respects from the ABNF specified in
    draft-ietf-xmpp-core.

 > One more thing: it seems that in your case, there are no [] delimiters
 > for IPv6address, and there is no mechanism for future ip protocol versions.
 > You may want to have a look at rfc2396bis to either bring this closer
 > to rfc2396bis or to explicitly mention that there is a difference.

Are you suggesting that we replace:

        domain          = fqdn / address-literal

with:

        domain          = fqdn / host

... where the host rule is specified in rfc2396bis? If so, does this
apply to all XMPP addresses or only XMPP URIs? (I think the former, but
I want to make sure.)

Is the IPv6address rule specified in rfc2396bis intended to supersede
the IPv6address rule specified in RFC2373?

 > >   An xmpp: URI is opaque rather than hierarchical, and thus is similar
 > >   to a mailto: URI as specified in RFC 2368 [MAILTO].  Because an xmpp:
 > >   URI is opaque, the XMPP address (or "JID") contained therein SHOULD
 > >   include only a node identifier (OPTIONAL) and domain identifier
 > >   (REQUIRED) as defined in [XMPP-CORE]; while an xmpp: URI MAY include
 > >   the resource identifier portion of a JID if the XMPP entity must be
 > >   addressed as such, as a general rule this is not encouraged since the
 > >   delimiter used before a resource identifier in XMPP addresses is the
 > >   slash character ("/"), which is discouraged by [URI] for opaque URIs.
 >
 > There is no distinction between generic and opaque syntax anymore.
 > RFC2396bis now says:
 > "All URIs are parsed by generic syntax parsers when used. A URI scheme that
 > wishes to remain opaque to hierarchical processing must disallow the use of
 > slash and question mark characters. However, since a URI reference is only
 > modified by the generic parser if it contains a dot-segment (a complete path
 > segment of "." or "..", as described in Section 3.3), URI schemes may safely
 > use "/" for other purposes if they do not allow dot-segments."

In my working copy, I've removed the text about hierarchical vs. opaque
syntax.

 > >   While the "?" character is allowed in the resource identifier portion
 > >   of an XMPP address (according to [XMPP-CORE]), that character can be
 > >   used as a delimiter between the jid and the query parts of an xmpp:
 > >   URI; therefore, any instances of the "?" character in the resource
 > >   identifier portion of an XMPP address that is generated or processed
 > >   as an xmpp: URI MUST be escaped as "%3F" (as described in Section
 > >   2.2.5 of [URL-GUIDE]).
 >
 > That's a good example of a case that should be expressed in the syntax
 > rules. The syntax has to express the exact form of the URI as it is
 > allowed to appear, not some intention about the parts that go into the
 > URI. So as an example, if the syntax for the resource identifier in
 > xmpp were something simple like
 >      resource = *( 'a' / 'b' / 'c' / '?' )
 > then the syntax for the resource component in the xmpp scheme syntax
 > would have to read:
 >      resource = *( 'a' / 'b' / 'c' / '%3F' / '%3f' )
 > or so.

Again, it seems to me that this applies to XMPP URIs, not XMPP addresses
in general.

 > >2.3  Generation of XMPP URIs
 > >
 > >
 > >   When generating an XMPP URI, the generating application SHOULD follow
 > >   these steps:
 >
 > I'm probably repeating myself, but having these steps is in clear
 > conflict with the
 >     "xmpp:" jid [ "?" query ]
 > syntax rule above. Either there is an actual jid in there, or it's
 > something else that is the result of some processing. No shortcuts
 > allowed :-(.

That text was discussed on the XMPP WG list in order to assist
implementers. But I don't understand your comment; the intent is that,
yes, there is an "actual JID" in there. Are you saying that Step 1
("Obtain XMPP address (JID).") is not right because a real JID would
already have passed Steps 2 and 3?

 > >Saint-Andre            Expires February 11, 2005                [Page 4]
 > >Internet-Draft                  XMPP URI                     August 2004
 > >
 > >
 > >
 > >   1.  Obtain XMPP address (JID).
 > >   2.  Perform [IDNA] translation against the JID (in the form of a
 > >       UTF-8 string).
 >
 > What IDNA translation?

I think that people on the XMPP WG list meant "conversion" rather than
"translation".

 > The word 'translation' appears only once in
 > IDNA, in the copyright section. IDNA defines at least the ToASCII
 > operation and the ToUnicode operation, and it's not clear which
 > one you mean.

When generating an XMPP URI, you would convert using ToASCII (when
presenting an XMPP URI to a user, you would convert with ToUnicode).

Is the following text more accurate?

    2. Convert the JID from a UTF-8 string to US-ASCII using the ToASCII
    conversion described in [IDNA].

 > And is this operation supposed to be applied to
 > the whole JID, including node and resource?

Yes.

 > Also, UTF-8 is only given in examples in IDNA, and it's not clear
 > what it is supposed to do here.

A JID is a UTF-8 string. More precisely, a JID is a string of Unicode
characters that meets a certain set of stringprep profiles, following a
certain set of syntax rules, encoded as UTF-8.

 > >   3.  Verify that the UTF-8 string conforms to the format defined in
 > >       [XMPP-CORE], including all appropriate [STRINGPREP] profiles.
 >
 > Since when do stringprep profiles apply to UTF-8? Aren't they much
 > more general? What if somebody wants to implement these operations
 > in UTF-16?

UTF-16 is not allowed by XMPP Core.

 > >   4.  Convert any bytes that are not US-ASCII (see [ASCII]) to %hexhex
 > >       format as described in Section 2.2.5 of [URL-GUIDE].
 >
 > I think it would be very helpful to be a bit more specific here
 > and to mention UTF-8. In this step, it's actually relevant.

OK, I will clean that up.

 > >   5.  Prepend the 'xmpp:' scheme.
 > >   6.  Append the query component, if any.
 >
 >
 >
 > >2.4  Processing of XMPP URIs
 > >
 > >
 > >   When processing an XMPP URI, the processing application SHOULD follow
 > >   these steps:
 > >
 > >
 > >   1.  Obtain URI.
 > >   2.  Convert any parts in %hexhex format to UTF-8 as described in
 > >       Section 2.2.5 of [URL-GUIDE].
 > >   3.  Verify that the UTF-8 string conforms to the format defined in
 > >       [XMPP-CORE], including all appropriate [STRINGPREP] profiles.
 > >   4.  Perform [IDNA] translation against the UTF-8 string.
 > >   5.  Extract the XMPP address by removing the 'xmpp:' scheme and the
 > >       query component (if any).
 >
 > Very similar comments to above apply here.

Ditto.

 > >   At this point, the processing application would either (1) complete
 > >   further XMPP handling itself or (2) invoke a helper application to
 > >   complete XMPP handling; such XMPP handling would most likely consist
 > >   of the following steps:
 > >
 > >
 > >   1.  Authenticating with an appropriate XMPP server (e.g., a server
 > >       that a user has configured as his or her registered service
 > >       provider) if not already authenticated.
 > >   2.  Optionally determining the nature of the intended recipient
 > >       (e.g., via [DISCO]).
 > >   3.  Optionally presenting an appropriate interface to a user based on
 > >       the nature of the intended recipient and/or the contents of the
 > >       query component (however, if the application does not understand
 > >       the query component, it MUST ignore the query component and treat
 > >       the URI as consisting of "xmpp:jid" rather than
 > >       "xmpp:jid?query").
 >
 > What's the secret recipie of understanding the query component?
 > I assume that 'application' means 'client application', because the
 > interface is presented by the client. But clients are not supposed
 > to understand the query component.

What's the secret recipe for HTTP clients to understand the query
component of an HTTP URI?

In earlier versions of this memo, we specified several allowable query
components, such as xmpp:user@host?message (where the "message" query
component would provide a hint about what kind of interface might be
presented to a user, for instance). However, it seemed undly limiting to
say that only two or three query components are allowable since we don't
know what kinds of things people might want to do with XMPP URIs in the
future (just as it might have been limiting to do so for HTTP some years
ago). So we've left these optional for now, although they may be
specified in more detail in the future.

 > >   4.  Generating an XMPP stanza that translates any user or application
 > >       inputs into their corresponding XMPP equivalents.
 > >   5.  Sending the XMPP stanza via the authenticated server connection
 > >       for delivery to the intended recipient.
 >
 > At this point, I realized that there isn't a single example in this
 > document. Some examples of xmpp scheme URIs would definitely help,
 > both US-ASCII only and others that included non-ASCII text via
 > %HH (I can help).

OK, I can add some examples. I will write some up and post them to these
lists.

 > Also, examples of the above steps would be great.

Agreed.

 > >2.5  Internationalization
 > >
 > >
 > >   By definition, an XMPP URI is also an Internationalized Resource
 > >
 > >
 > >
 > >
 > >Saint-Andre            Expires February 11, 2005                [Page 5]
 > >Internet-Draft                  XMPP URI                     August 2004
 > >
 > >
 > >
 > >   Identifier (see [IRI]).
 >
 > By definition, any URI, even something as simple as http://www.jabber.org,
 > is an IRI. Not worth mentioning.

OK. :-)

 > What you should say is that xmpp URIs, because they use UTF-8 and %HH
 > to encode non-ASCII characters, are designed to work well with IRIs,
 > in particular that except for the stringprep verification and issues
 > with syntax-relevant US-ASCII characters such as the '?', an XMPP
 > IRI can directly be constructed by prepending "xmpp:" to a jid.

Yes. I will clarify this in my working copy.

 > >As specified in [XMPP-CORE], each portion of
 > >   a JID (node identifier, domain identifier, resource identifier) is
 > >   allowed to be a fully internationalized string in accordance with
 > >   various profiles of [STRINGPREP]; any non-US-ASCII characters in such
 > >   strings (as well as any byte that is not in the set
 > >   a-zA-Z0-9!$*.?_~+=) MUST be properly transformed to %hexhex format as
 > >   described in Section 2.2.5 of [URL-GUIDE].
 >
 > This part should probably come before the text about IRIs in this section.
 > And please mention UTF-8 when talking about Section 2.2.5 of [URL-GUIDE].

OK.

<snip/>

 > >3.3  Character encoding considerations
 > >
 > >
 > >   Representation of non-US-ASCII character sets
 >
 > Why 'character sets'? Why not just 'characters'?

Yes, that is clearer. Changed in my working copy.

 > >in local-part strings
 >
 > The term 'local-part string' turns up for the first time here.
 >
 > >   is limited to the standard methods provided as extensions to RFC 2822
 > >   [IMF].
 >
 > I have no clue why RFC 2822 is mentioned here. Does it allow %HH?

I have simply removed the first sentence of this section, since provides
confusion rather than enlightenment.

 > >Specifically, for each byte, if the byte is not in the set
 > >   a-zA-Z0-9!$*.?_~+= then transform the byte to %hexhex format as
 > >   described in Section 2.2.5 of [URL-GUIDE].

 > Again, please mention UTF-8 when talking about Section 2.2.5 of [URL-GUIDE].

Will do.

<snip/>

 > >4.  IANA Considerations
 > >
 > >
 > >   This entire document addresses IANA considerations.
 >
 > Change to something like: This document registers an URI scheme.
 > The registration template is in section 3.

Done.

 > >5.  Security Considerations
 > >
 > >
 > >   Detailed security considerations for XMPP are given in [XMPP-CORE].
 > >   Providing an interface to XMPP services from non-native applications
 > >   introduces new security concerns.  For example, the ability to
 > >   interact with XMPP entities via a web browser may expose sensitive
 > >   information to attacks that are not possible or that are unlikely on
 > >   a native XMPP network.  Due care must be taken in deciding what
 > >   information is appropriate for representing in xmpp: URIs; in
 > >   particular, passwords MUST NOT be represented.
 >
 > How would I represent a password in the first place?

Well, you wouldn't in an XMPP URI, so I've removed that line from my
working copy.

 > >6.  References

 > >   [IRI]      Duerst, M. and M. Suignard, "Internationalized Resource
 > >              Identifiers (IRI)", draft-duerst-i18n-iri-06 (work in
 > >              progress), February 2004.
 >
 > This is now draft-duerst-i18n-iri-09. And you should write it as
 > RFC ZZZZ, with instructions to the RFC editor to change to the actual
 > number. Same for RFC 2396bis.

I will update these references in consultation with the RFC Editor when
the time comes.

 > Regards,    Martin.

Many thanks for the comments.

Peter

--
Peter Saint-Andre
Jabber Software Foundation
http://www.jabber.org/people/stpeter.php

Received on Monday, 23 August 2004 23:59:34 UTC