- From: Michael A. Puls II <shadow2531@gmail.com>
- Date: Thu, 10 Sep 2009 18:24:05 -0400
- To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Cc: public-iri@w3.org
On Thu, 10 Sep 2009 05:28:14 -0400, Martin J. Dürst <duerst@it.aoyama.ac.jp> wrote: > Hello Michael, > Many thanks for this example. I hope Anne can do some checks on the > HTML5 side. I just tried your example in Opera 10, and it gave the UTF-8 > based URI when I asked for 'copy link address'. I also clicked on the > link and asked it to use my default MUA (Thunderbird with Eudora), and I > got a draft email with legible text (Moskow at the start, and ITAR-TASS, > that's about how much Russian I read). Thanks. Yes, I get that utf-8 behavior in Firefox and Safari also. I think things are more interoperable that way. However, my concern is that HTML5 (well, the iri/uri spec additions for HTML5) contradicts that and says to use the page's encoding instead. I do not feel that is a good idea for some schemes. Also, for 'mailto:' links in web pages, I want to specifically avoid the part before '?' and the part after '?' being resolved against a different encoding. For mailto:, that would be undesirable and would force authors to use "mailto:?to=value" instead of "mailto:value" so that the to value is resolved against the same encoding as the other values (like subject and body etc.). But, using mailto:?to= still isn't supported as well as mailto:value, so that'd be bad too. For mailto links in html pages, I think the resolving should always be (by default at least) utf-8 all the way through. (So that the .href getter on a link and copy link address etc. all return something utf-8-based regardless of the page's encoding). This is basically what browsers do now. Just want to make sure the specs don't contradict that, as browsers do it that way for a reason. For mailto in HTML forms, I don't have too much preference as no one uses it. I also think that for javascript:, it's probably best to always resolve to percent-encoded utf-8 too. Also, if I remember correctly, it was desired that http(s) in HTML5 pages be utf-8-only, but that wasn't possible for legacy reasons. I don't think mailto: and some other schemes have that restraint. With that said, as Anne said, maybe using the page encoding should only be a must for http(s) and that other protocols may ignore the page's encoding and resolve to percent-encoded UTF-8. Now, if JS in browsers had an iconv() so that you can easily convert to what you want and browsers had options to control the encoding, per-protocol, for .href etc., per-site, then, maybe it wouldn't matter. But, for now, just always using utf-8 for some schemes makes things consistent and allows that expectation to be relied upon. Now, I'm not 100% sure what iri-bis/HTML5 says about this. It's really low-level. which is why I'm asking for clarification (which Larry said he'd respond when he gets a chance). -- Michael
Received on Thursday, 10 September 2009 22:24:45 UTC