What schemes take query parts? (was: Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?)

Dear URI experts,

[I have copied the URI mailing list because I hope to get some 
information from there.]

In the context of HTML5-specific treatment of query parts in IRIs/URIs 
(using the document encoding rather than UTF-8 when converting non-ASCII 
characters to %-encoding), Michael A. Puls II recently reported that 
such behavior should not apply to mailto: URIs.

Now we are trying to figure out what happens, or what's appropriate, for 
other kinds of URI schemes. In particular, we also want to know which 
schemes do not take query parameters (e.g. data, ftp). Or it may be 
easier to pose the question the other way round: Which schemes do take 
query parts (we know of http, https, and mailto).

For the schemes that take query parts, we would like to know whether 
these parts are restricted to fixed parameters and values or whether 
they can contain natural-language (and therefore potentially non-ASCII) 
data (even if that is encoded with %-escaping), and in the later case, 
whether there are any encoding conventions for that query part (UTF-8, 
document encoding, ...).

Many thanks in advance for your help.

Regards,    Martin.

On 2009/09/10 18:45, Anne van Kesteren wrote:
> On Thu, 10 Sep 2009 11:28:14 +0200, Martin J. Dürst
> <duerst@it.aoyama.ac.jp> wrote:
>> Many thanks for this example. I hope Anne can do some checks on the
>> HTML5 side.
>
> http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#urls has
> the HTML5 rules from when this was still in the HTML5 specification. As
> far as I can tell the encoding <query> was done irrespective of the
> scheme per that specification. Someone should probably study
> implementations to see if this should be changed to just affect
> http/https or more.


-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Friday, 11 September 2009 08:18:04 UTC