Re: '#' in mailto URIs from Michael A. Puls II on 2009-10-16 (public-iri@w3.org from October 2009)

From: Michael A. Puls II <shadow2531@gmail.com>
Date: Fri, 16 Oct 2009 05:41:29 -0400
To: "Roy T. Fielding" <fielding@gbiv.com>
Cc: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>, "Julian Reschke" <julian.reschke@gmx.de>, "Larry Masinter" <masinter@adobe.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, "jwz@jwz.org" <jwz@jwz.org>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-ID: <op.u1vzvfav1ejg13@sandra-svwliu01>

On Fri, 16 Oct 2009 04:28:08 -0400, Roy T. Fielding <fielding@gbiv.com>  
wrote:

> On Oct 16, 2009, at 8:54 AM, Michael A. Puls II wrote:
>
>> On Thu, 15 Oct 2009 22:43:45 -0400, Silvia Pfeiffer  
>> <silviapfeiffer1@gmail.com> wrote:
>>> The main problem that I see is where "#" is being used multiple times
>>> in such a uri, e.g.
>>>
>>> mailto:?subject=asdf#ghij&body=before#after
>>>
>>> Per RFC3986, the first "#" creates the fragment, so the body is never
>>> regarded as another query parameter. I would think that "#" has to be
>>> escaped in mailto uris. If there weren't multiple query parameters in
>>> a mailto uri, one could simply make the user agent append the fragment
>>> part to the query parameter data to get around the contradiction, but
>>> that is not possible with multiple "#" parameters.
>>
>> Well, since frag ids are of no use in mailto URIs currently, if you  
>> encounter "mailto:?subject=asdf#ghij&body=before#after", what do you  
>> think the creator of the URI intended? For me, the creator obviously  
>> meant "mailto:?subject=asdf%23ghij&body=before%23after" and could not  
>> have meant anything else.
>
> That's only because you think there is no client-side role
> for a fragment on mailto, which is probably right today
> and most likely wrong eventually.  I have no doubt that someone
> is going to write a javascript handler that does something funky
> based on the fragid in a mailto reference, eventually.

O.K., so you're saying that # has to be reserved for all URIs no matter  
what and no matter if it currently has any use for the scheme, because, it  
might have some use in the future for that scheme and we must not screw up  
that use-case?

>> So, although # is invalid in a header field value, in the case of  
>> mailto, it's obvious what the creator meant, imo.
>
> No, it isn't.

Thanks. It's good to hear everyone's interpretation on that.

>> For mutliple # in the case above, if the first # starts a fragid for  
>> mailto, and fragids in mailto URIs actually did something, then, I  
>> would consider the fragid segment to just be "#ghij&body=before#after",  
>> where the creator actually meant "#ghij%26body%3Dbefore%23after". (Or,  
>> you can assume the creator meant  
>> "mailto:?subject=asdf%23ghij&body=before#after" where the creator meant  
>> the first # to be %23 and actually meant to use a fragid of #after. But  
>> that's highly unlikely the creator meant that.)
>>
>> To be clear though, the concern I have is how to handle mailto URIs  
>> where the creator meant %23, but used a raw # instead, because they did  
>> it on accident or didn't know that it had to be encoded as %23.
>
> Actually, your concern is how to parse an invalid reference
> and transform it into something that is valid but may or may
> not be what the author intended. That is simple error handling

The simple error handling that I use for mailto parsers I've written is to  
treat # as %23 (and even normalize # to %23). As mentioned, the parsers of  
a lot of mail clients treat # as %23 too, which I think makes sense for  
error handling (since there's currently nothing they do with fragids).

> and the "right" answer depends on whether your parser is a
> browser, a link checker, or something else.

1. A browser that passes the URI to a mail client. Must it pass #value to  
the mail client or only pass everything before the first #?

2. A mail client. If it doesn't support any type of handling of a fragid,  
must it assume that # is %23, or must it chop off the URI at the first #  
before parsing?

What are the right answers for those 2 specifically?

>> You could even say that in all cases where you find a # in a mailto  
>> URI, the creator meant %23. The only reason for UAs not to make that  
>> assumption is so things don't get messed up in the future if fragid  
>> support for mailto is actually defined and does something.
>>
>> That's my reasoning fwiw. But, if UAs should just chop off the maito  
>> URI at the first # no matter what, then O.K., but that should be  
>> explicitly mentioned.
>
> It should be explicitly mentioned by something, most likely
> a browser implementation spec for parsing arbitrary data as
> IRI references.
>
> It doesn't belong in the definition of the URI because the
> only interoperable string is the one with %23 where the # is
> used as data.  Anything else is going to break at least one
> of the many forms of web components.

So, what you're saying is that you don't want the mailto URI spec touching  
error handling with a 10ft pole and only want to assume that perfect URIs  
will be processed?

-- 
Michael

Received on Friday, 16 October 2009 09:42:05 UTC