Re: HTML5 - resolving href="mailto:" based on page's encoding or force utf-8? from Martin J. Dürst on 2009-09-11 (public-iri@w3.org from September 2009)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Fri, 11 Sep 2009 16:35:20 +0900
To: "Michael A. Puls II" <shadow2531@gmail.com>
CC: public-iri@w3.org
Message-ID: <4AA9FDB8.1050103@it.aoyama.ac.jp>

Hello Michael,

On 2009/09/11 7:24, Michael A. Puls II wrote:
> On Thu, 10 Sep 2009 05:28:14 -0400, Martin J. Dürst
> <duerst@it.aoyama.ac.jp> wrote:
>
>> Hello Michael,
>> Many thanks for this example. I hope Anne can do some checks on the
>> HTML5 side. I just tried your example in Opera 10, and it gave the
>> UTF-8 based URI when I asked for 'copy link address'. I also clicked
>> on the link and asked it to use my default MUA (Thunderbird with
>> Eudora), and I got a draft email with legible text (Moskow at the
>> start, and ITAR-TASS, that's about how much Russian I read).
>
> Thanks.
>
> Yes, I get that utf-8 behavior in Firefox and Safari also. I think
> things are more interoperable that way.
>
> However, my concern is that HTML5 (well, the iri/uri spec additions for
> HTML5) contradicts that and says to use the page's encoding instead. I
> do not feel that is a good idea for some schemes.
>
> Also, for 'mailto:' links in web pages, I want to specifically avoid the
> part before '?' and the part after '?' being resolved against a
> different encoding. For mailto:, that would be undesirable and would
> force authors to use "mailto:?to=value" instead of "mailto:value" so
> that the to value is resolved against the same encoding as the other
> values (like subject and body etc.). But, using mailto:?to= still isn't
> supported as well as mailto:value, so that'd be bad too.
>
> For mailto links in html pages, I think the resolving should always be
> (by default at least) utf-8 all the way through. (So that the .href
> getter on a link and copy link address etc. all return something
> utf-8-based regardless of the page's encoding). This is basically what
> browsers do now. Just want to make sure the specs don't contradict that,
> as browsers do it that way for a reason.

I agree, and I haven't found anybody who disagrees yet. If that stays as 
it is, I'll make sure that the spec says what it should say on that point.

Regards,   Martin.

> For mailto in HTML forms, I don't have too much preference as no one
> uses it.
>
> I also think that for javascript:, it's probably best to always resolve
> to percent-encoded utf-8 too.
>
> Also, if I remember correctly, it was desired that http(s) in HTML5
> pages be utf-8-only, but that wasn't possible for legacy reasons. I
> don't think mailto: and some other schemes have that restraint.
>
> With that said, as Anne said, maybe using the page encoding should only
> be a must for http(s) and that other protocols may ignore the page's
> encoding and resolve to percent-encoded UTF-8.
>
> Now, if JS in browsers had an iconv() so that you can easily convert to
> what you want and browsers had options to control the encoding,
> per-protocol, for .href etc., per-site, then, maybe it wouldn't matter.
> But, for now, just always using utf-8 for some schemes makes things
> consistent and allows that expectation to be relied upon.
>
> Now, I'm not 100% sure what iri-bis/HTML5 says about this. It's really
> low-level. which is why I'm asking for clarification (which Larry said
> he'd respond when he gets a chance).
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Friday, 11 September 2009 07:36:31 UTC