[whatwg] [WF2] action="mailto:" - encoding spaces

On Tue, 02 Dec 2008 02:48:15 -0500, Ian Hickson <ian at hixie.ch> wrote:

> On Wed, 29 Oct 2008, Michael A. Puls II wrote:
>> On Wed, 29 Oct 2008 03:42:17 -0400, Ian Hickson <ian at hixie.ch> wrote:
>> > On Wed, 29 Oct 2008, Michael A. Puls II wrote:
>> > >
>> > > What about the method="POST" case where the query string is kept?
>> > >
>> > > <form action="mailto:?subject=1+2" method="POST">
>> > >     <input type="text" name="body" value="1+2">
>> > >     <input type="text" name="other" value="1 2">
>> > >     <input type="submit">
>> > > </form>
>> > >
>> > > When submitting that, I expect to see:
>> > >
>> > > mailto:?subject=1%2B2&body=body%3D1%252B2%26other%3D1%25202
>> > >
>> > > submitted to the mail client.
>> > >
>> > > The current POST section seems to say that this would be submitted
>> > > instead:
>> > >
>> > > mailto:?subject=1+2&body=body%3D1%252B2%26other%3D1+2
>> > >
>> > > In other words, I think spaces in values should be emitted as %20
>> > > for POST too and in the case there's a query string present in the
>> > > action attribute for POST, any + in the hvalues of the query string
>> > > should be normalized to %2B (to be consistent with a + inside a form
>> > > control's value that gets converted to %2B)
>> >
>> > The idea is that the same thing as would be posted to an HTTP server
>> > is what is sent using the e-mail body, so I think we'd want the exact
>> > same "+" behavior as normally.
>>
>> O.K., but in the case of the + that's in the mailto URI in the action
>> attribute, the author means a '+' and not a space (they're allowed to be
>> left in raw form in a mailto URI). If it gets sent to a server, the +
>> will be treated as a space, which is not what is intended.
>
> I actually can't find where it is defined that the + in an HTTP URI
> represents a space. (I can find where it says that a space is to be
> converted into a +, but not the other way around.)
>
> My understanding, though, is that the convention that + represents a  
> space
> is not part of the URI syntax, but part of the syntax of the format used
> to encode the data into the URI, which for HTTP URIs is generally
> application/x-www-form-urlencoded. But nothing stops this format from
> being used elsewhere, e.g. in the body of an e-mail or a POST submission.
>
>
>> The workaround is of course for the author to make sure to encode that +
>> as %2B (or never use anything but action="mailto:" even for POST). But,
>> for good measure, it seems like the UA should fix that if the + will
>> ever end up in an HTTP URI.
>
> I don't follow.
>
>
>> Of course right now, browsers only pass the data as a mailto URI to an
>> email program, so the + from the query string will be a + and come out
>> fine in the compose window. As for spaces in form control values coming
>> out as + (for POST) in a programs's body field, that's not as big of a
>> deal as there's no use-case to *see* any of the data *like that* anyway.
>> But it does seem incorrect to encode mailto spaces as + though.
>
> I don't follow.
>
>
>> However, if for POST, if everything after 'mailto:' in the action
>> attribute was dropped (like get) and all you ever had was
>> mailto:?body=encoded_stuff that was POSTed, then the spec could say that
>> the value you might see in the body field represents *HTTP* url encoded
>> data.
>
> We can't drop everything, because then you'd lost the Subject: line, etc.
>
>
>> Or, the spec could say that if the protocol in the action attribute is
>> mailto:, +s in the action attribute have to be encoded as %2B and spaces
>> in the action attribute have to be encoded as %20. Then, the validator
>> can catch that and the spec can say (for POST), that the body hvalue
>> that gets generated from the form represents *HTTP* form data. Then,
>> it'll be clear why +s in the value are represented as + instead of %20.
>
> I don't follow here either.
>
>
>> Or, if it's O.K. for a UA's URI normalizer/resolver to take
>> action="mailto:?subject=1+2 3" and normalize that to
>> "action="mailto:?subject=1%2B2%203" for use with the form's .action
>> getter, I guess that might solve it to.
>
> I think we may be talking at cross-purposes... which requirements in the
> spec are you referring to?
>

I'll try to explain more.

Consider this form:

<form action="mailto:?subject=1+2" action="POST">
    <input type="submit" value="Compose">
</form>

(which contains a valid mailto URI meaning that "1+2" should be the value of the subject)

Imagine in your browser that it supports setting the default mailto URI handler to Gmail (a web-based client that uses *http* URIs).

If you submit that form, you'd get <https://mail.google.com/mail/?compose=1&view=cm&fs=1&su=1+2>
, which if you try, you'll see emits "1 2" instead of "1+2" in the subject field.

Basically, HTML is trying to say that "+" is equal to " ", but mailto URI hnames and hvalues are not application/x-www-form-urlencoded. They're close, but have less reserved characters.

So, the browser has to convert the + to %2B before submitting (because the value will end up in an http URI, in this case) to Gmail so the correct value ends up in the subject field. (This isn't a problem for non-web-based email clients because they don't treat hnames and hvalues in mailto URIs as application/x-www-form-urlencoded).

If you specify action="mailto:?subject=1%2B2", you avoid the problem and there are ways that Gmail could avoid the problem. Yet, the problem will still be there.

So, basically, the problem is, RFC2368 says '+' is a '+' and a space is "%20". HTML says that for the value in action="", '+' is a space, "%20" is a space and "%2B" is a '+'. 

So, when you put a mailto URI in the action attribute, you have a conflict of specs. If application/x-www-form-urlencoded gets priority over RFC2368 in this case, that's fine. I just think it needs to be spelled out in the HTML spec more.

Personally, I'd suggest the UA should do this for action="mailto:":

if (form.action.search(/mailto:/i) == 0 && form.method == "post") {
    form.action = form.action.replace(/\+/g, "%2B");
}

, then things will come out fine in both web-based and non-web-based clients if the author of the markup didn't know they had to convert the regular mailto URI to an HTML action attribute mailto URI.

-- 
Michael

Received on Tuesday, 2 December 2008 01:34:44 UTC