Re: What schemes take query parts? (was: Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?) from Tom Petch on 2009-09-15 (uri@w3.org from September 2009)

From: Tom Petch <nwnetworks@dial.pipex.com>
Date: Tue, 15 Sep 2009 10:06:32 +0200
To: Martin J. Dürst <duerst@it.aoyama.ac.jp>, "Anne van Kesteren" <annevk@opera.com>
Cc: "Michael A. Puls II" <shadow2531@gmail.com>, <public-iri@w3.org>, <uri@w3.org>
Message-ID: <01b701ca35e9$1d5bfba0$0601a8c0@allison>

XMPP [RFC4622] has a query part.

     iquerycomp = iquerytype [ *ipair ]
     iquerytype = *iunreserved
     ipair      = ";" ikey "=" ivalue
     ikey       = *iunreserved
     ivalue     = *( iunreserved / pct-encoded )

likewise IMAP [RFC5092]

   If the "?<enc-search>" field is present, the program interpreting the
   URL should use the contents of this field as arguments following an
   IMAP4 SEARCH command.  These arguments are likely to contain unsafe
   characters such as " " (space) (which are likely to be present in the
   <enc-search>).  If unsafe characters are present, they MUST be
   percent-encoded as described in [URI-GEN].

   Note that quoted strings and non-synchronizing literals [LITERAL+]
   are allowed in the <enc-search> content; however, synchronizing
   literals are not allowed, as their presence would effectively mean
   that the agent interpreting IMAP URLs needs to parse an <enc-search>
   content, find all synchronizing literals, and perform proper command
   continuation request handling (see Sections 4.3 and 7 of [IMAP4]).

Tom Petch

----- Original Message -----
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
To: "Anne van Kesteren" <annevk@opera.com>
Cc: "Michael A. Puls II" <shadow2531@gmail.com>; <public-iri@w3.org>;
<uri@w3.org>
Sent: Friday, September 11, 2009 10:16 AM
Subject: What schemes take query parts? (was: Re: HTML5 - resolving
href="mailto:" based on page's encoding or force utf-8?)

Dear URI experts,

[I have copied the URI mailing list because I hope to get some
information from there.]

In the context of HTML5-specific treatment of query parts in IRIs/URIs
(using the document encoding rather than UTF-8 when converting non-ASCII
characters to %-encoding), Michael A. Puls II recently reported that
such behavior should not apply to mailto: URIs.

Now we are trying to figure out what happens, or what's appropriate, for
other kinds of URI schemes. In particular, we also want to know which
schemes do not take query parameters (e.g. data, ftp). Or it may be
easier to pose the question the other way round: Which schemes do take
query parts (we know of http, https, and mailto).

For the schemes that take query parts, we would like to know whether
these parts are restricted to fixed parameters and values or whether
they can contain natural-language (and therefore potentially non-ASCII)
data (even if that is encoded with %-escaping), and in the later case,
whether there are any encoding conventions for that query part (UTF-8,
document encoding, ...).

Many thanks in advance for your help.

Regards,    Martin.

On 2009/09/10 18:45, Anne van Kesteren wrote:
> On Thu, 10 Sep 2009 11:28:14 +0200, Martin J. Dürst
> <duerst@it.aoyama.ac.jp> wrote:
>> Many thanks for this example. I hope Anne can do some checks on the
>> HTML5 side.
>
> http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#urls has
> the HTML5 rules from when this was still in the HTML5 specification. As
> far as I can tell the encoding <query> was done irrespective of the
> scheme per that specification. Someone should probably study
> implementations to see if this should be changed to just affect
> http/https or more.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Tuesday, 15 September 2009 10:46:50 UTC