Re: '#' in mailto URIs from Roy T. Fielding on 2009-10-16 (public-iri@w3.org from October 2009)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Fri, 16 Oct 2009 10:28:08 +0200
To: Michael A.Puls II <shadow2531@gmail.com>
Cc: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>, "Julian Reschke" <julian.reschke@gmx.de>, "Larry Masinter" <masinter@adobe.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, "jwz@jwz.org" <jwz@jwz.org>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-Id: <6BA637D6-95CA-40F5-874E-DB4849877932@gbiv.com>

On Oct 16, 2009, at 8:54 AM, Michael A. Puls II wrote:

> On Thu, 15 Oct 2009 22:43:45 -0400, Silvia Pfeiffer <silviapfeiffer1@gmail.com 
> > wrote:
>> The main problem that I see is where "#" is being used multiple times
>> in such a uri, e.g.
>>
>> mailto:?subject=asdf#ghij&body=before#after
>>
>> Per RFC3986, the first "#" creates the fragment, so the body is never
>> regarded as another query parameter. I would think that "#" has to be
>> escaped in mailto uris. If there weren't multiple query parameters in
>> a mailto uri, one could simply make the user agent append the  
>> fragment
>> part to the query parameter data to get around the contradiction, but
>> that is not possible with multiple "#" parameters.
>
> Well, since frag ids are of no use in mailto URIs currently, if you  
> encounter "mailto:?subject=asdf#ghij&body=before#after", what do you  
> think the creator of the URI intended? For me, the creator obviously  
> meant "mailto:?subject=asdf%23ghij&body=before%23after" and could  
> not have meant anything else.

That's only because you think there is no client-side role
for a fragment on mailto, which is probably right today
and most likely wrong eventually.  I have no doubt that someone
is going to write a javascript handler that does something funky
based on the fragid in a mailto reference, eventually.

> So, although # is invalid in a header field value, in the case of  
> mailto, it's obvious what the creator meant, imo.

No, it isn't.

> For mutliple # in the case above, if the first # starts a fragid for  
> mailto, and fragids in mailto URIs actually did something, then, I  
> would consider the fragid segment to just be  
> "#ghij&body=before#after", where the creator actually meant "#ghij 
> %26body%3Dbefore%23after". (Or, you can assume the creator meant  
> "mailto:?subject=asdf%23ghij&body=before#after" where the creator  
> meant the first # to be %23 and actually meant to use a fragid of  
> #after. But that's highly unlikely the creator meant that.)
>
> To be clear though, the concern I have is how to handle mailto URIs  
> where the creator meant %23, but used a raw # instead, because they  
> did it on accident or didn't know that it had to be encoded as %23.

Actually, your concern is how to parse an invalid reference
and transform it into something that is valid but may or may
not be what the author intended.  That is simple error handling
and the "right" answer depends on whether your parser is a
browser, a link checker, or something else.

> You could even say that in all cases where you find a # in a mailto  
> URI, the creator meant %23. The only reason for UAs not to make that  
> assumption is so things don't get messed up in the future if fragid  
> support for mailto is actually defined and does something.
>
> That's my reasoning fwiw. But, if UAs should just chop off the maito  
> URI at the first # no matter what, then O.K., but that should be  
> explicitly mentioned.

It should be explicitly mentioned by something, most likely
a browser implementation spec for parsing arbitrary data as
IRI references.

It doesn't belong in the definition of the URI because the
only interoperable string is the one with %23 where the # is
used as data.  Anything else is going to break at least one
of the many forms of web components.

....Roy

Received on Friday, 16 October 2009 08:29:16 UTC