Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8?

On Thu, 10 Sep 2009 05:28:14 -0400, Martin J. Dürst  
<duerst@it.aoyama.ac.jp> wrote:

> Hello Michael,
>  Many thanks for this example. I hope Anne can do some checks on the  
> HTML5 side. I just tried your example in Opera 10, and it gave the UTF-8  
> based URI when I asked for 'copy link address'. I also clicked on the  
> link and asked it to use my default MUA (Thunderbird with Eudora), and I  
> got a draft email with legible text (Moskow at the start, and ITAR-TASS,  
> that's about how much Russian I read).

Thanks.

Yes, I get that utf-8 behavior in Firefox and Safari also. I think things  
are more interoperable that way.

However, my concern is that HTML5 (well, the iri/uri spec additions for  
HTML5) contradicts that and says to use the page's encoding instead. I do  
not feel that is a good idea for some schemes.

Also, for 'mailto:' links in web pages, I want to specifically avoid the  
part before '?' and the part after '?' being resolved against a different  
encoding. For mailto:, that would be undesirable and would force authors  
to use "mailto:?to=value" instead of "mailto:value" so that the to value  
is resolved against the same encoding as the other values (like subject  
and body etc.). But, using mailto:?to= still isn't supported as well as  
mailto:value, so that'd be bad too.

For mailto links in html pages, I think the resolving should always be (by  
default at least) utf-8 all the way through. (So that the .href getter on  
a link and copy link address etc. all return something utf-8-based  
regardless of the page's encoding). This is basically what browsers do  
now. Just want to make sure the specs don't contradict that, as browsers  
do it that way for a reason.

For mailto in HTML forms, I don't have too much preference as no one uses  
it.

I also think that for javascript:, it's probably best to always resolve to  
percent-encoded utf-8 too.

Also, if I remember correctly, it was desired that http(s) in HTML5 pages  
be utf-8-only, but that wasn't possible for legacy reasons. I don't think  
mailto: and some other schemes have that restraint.

With that said, as Anne said, maybe using the page encoding should only be  
a must for http(s) and that other protocols may ignore the page's encoding  
and resolve to percent-encoded UTF-8.

Now, if JS in browsers had an iconv() so that you can easily convert to  
what you want and browsers had options to control the encoding,  
per-protocol, for .href etc., per-site, then, maybe it wouldn't matter.  
But, for now, just always using utf-8 for some schemes makes things  
consistent and allows that expectation to be relied upon.

Now, I'm not 100% sure what iri-bis/HTML5 says about this. It's really  
low-level. which is why I'm asking for clarification (which Larry said  
he'd respond when he gets a chance).

-- 
Michael

Received on Thursday, 10 September 2009 22:24:45 UTC