Re: '#' in mailto URIs

On Oct 16, 2009, at 11:41 AM, Michael A. Puls II wrote:

> On Fri, 16 Oct 2009 04:28:08 -0400, Roy T. Fielding  
> <fielding@gbiv.com> wrote:
>
>> On Oct 16, 2009, at 8:54 AM, Michael A. Puls II wrote:
>>
>>> On Thu, 15 Oct 2009 22:43:45 -0400, Silvia Pfeiffer <silviapfeiffer1@gmail.com 
>>> > wrote:
>>>> The main problem that I see is where "#" is being used multiple  
>>>> times
>>>> in such a uri, e.g.
>>>>
>>>> mailto:?subject=asdf#ghij&body=before#after
>>>>
>>>> Per RFC3986, the first "#" creates the fragment, so the body is  
>>>> never
>>>> regarded as another query parameter. I would think that "#" has  
>>>> to be
>>>> escaped in mailto uris. If there weren't multiple query  
>>>> parameters in
>>>> a mailto uri, one could simply make the user agent append the  
>>>> fragment
>>>> part to the query parameter data to get around the contradiction,  
>>>> but
>>>> that is not possible with multiple "#" parameters.
>>>
>>> Well, since frag ids are of no use in mailto URIs currently, if  
>>> you encounter "mailto:?subject=asdf#ghij&body=before#after", what  
>>> do you think the creator of the URI intended? For me, the creator  
>>> obviously meant "mailto:?subject=asdf%23ghij&body=before%23after"  
>>> and could not have meant anything else.
>>
>> That's only because you think there is no client-side role
>> for a fragment on mailto, which is probably right today
>> and most likely wrong eventually.  I have no doubt that someone
>> is going to write a javascript handler that does something funky
>> based on the fragid in a mailto reference, eventually.
>
> O.K., so you're saying that # has to be reserved for all URIs no  
> matter what and no matter if it currently has any use for the  
> scheme, because, it might have some use in the future for that  
> scheme and we must not screw up that use-case?

Yes.

>>> So, although # is invalid in a header field value, in the case of  
>>> mailto, it's obvious what the creator meant, imo.
>>
>> No, it isn't.
>
> Thanks. It's good to hear everyone's interpretation on that.
>
>>> For mutliple # in the case above, if the first # starts a fragid  
>>> for mailto, and fragids in mailto URIs actually did something,  
>>> then, I would consider the fragid segment to just be  
>>> "#ghij&body=before#after", where the creator actually meant "#ghij 
>>> %26body%3Dbefore%23after". (Or, you can assume the creator meant  
>>> "mailto:?subject=asdf%23ghij&body=before#after" where the creator  
>>> meant the first # to be %23 and actually meant to use a fragid of  
>>> #after. But that's highly unlikely the creator meant that.)
>>>
>>> To be clear though, the concern I have is how to handle mailto  
>>> URIs where the creator meant %23, but used a raw # instead,  
>>> because they did it on accident or didn't know that it had to be  
>>> encoded as %23.
>>
>> Actually, your concern is how to parse an invalid reference
>> and transform it into something that is valid but may or may
>> not be what the author intended. That is simple error handling
>
> The simple error handling that I use for mailto parsers I've written  
> is to treat # as %23 (and even normalize # to %23). As mentioned,  
> the parsers of a lot of mail clients treat # as %23 too, which I  
> think makes sense for error handling (since there's currently  
> nothing they do with fragids).

I think that is fine error handling for a current parser of
references (not URI/IRIs) that is not doing validation.  A
CMS is going to have very different error handling in that
case, as will a third-party link checker, as will a spider
that is reading mailto links for an entirely different reason.

I am not saying that error handling is bad or cannot be
standardized within limited contexts.  I am saying that it
is not standard across all contexts and thus cannot be
defined for a standard that is, by definition, context-free.

>> and the "right" answer depends on whether your parser is a
>> browser, a link checker, or something else.
>
> 1. A browser that passes the URI to a mail client. Must it pass  
> #value to the mail client or only pass everything before the first #?
>
> 2. A mail client. If it doesn't support any type of handling of a  
> fragid, must it assume that # is %23, or must it chop off the URI at  
> the first # before parsing?
>
> What are the right answers for those 2 specifically?

It is not a URI, and the right answer depends on context.
If it is a data entry dialog (like location bar) then
I would "internally redirect" the reference so that it is
rewritten with %23.  If it is in an href attribute, then I would
not use the fragment portion (i.e., ignore it).  The reason is
because this is not a common scenario in existing content and
it is far better to correct bad content than to just assume
you know what the author wanted.

>>> You could even say that in all cases where you find a # in a  
>>> mailto URI, the creator meant %23. The only reason for UAs not to  
>>> make that assumption is so things don't get messed up in the  
>>> future if fragid support for mailto is actually defined and does  
>>> something.
>>>
>>> That's my reasoning fwiw. But, if UAs should just chop off the  
>>> maito URI at the first # no matter what, then O.K., but that  
>>> should be explicitly mentioned.
>>
>> It should be explicitly mentioned by something, most likely
>> a browser implementation spec for parsing arbitrary data as
>> IRI references.
>>
>> It doesn't belong in the definition of the URI because the
>> only interoperable string is the one with %23 where the # is
>> used as data.  Anything else is going to break at least one
>> of the many forms of web components.
>
> So, what you're saying is that you don't want the mailto URI spec  
> touching error handling with a 10ft pole and only want to assume  
> that perfect URIs will be processed?

The role of the scheme spec is to define what is interoperable
for all consumers of the URI -- that's why we have a restricted
syntax.  We don't need all consumers to behave the same way
when encountering an invalid reference, even if all browsers
do behave the same way (browsers are easily the smallest subset
of Web implementations with the least variance among them).

....Roy

Received on Friday, 16 October 2009 11:37:54 UTC