- From: Robert J Burns <rob@robburns.com>
- Date: Sun, 29 Jun 2008 21:56:17 +0300
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Philip Taylor <pjt47@cam.ac.uk>, Ian Hickson <ian@hixie.ch>, HTML WG <public-html@w3.org>
Hi Julian, On Jun 29, 2008, at 3:11 PM, Julian Reschke wrote: > > Philip Taylor wrote: >> Ian Hickson wrote: >>>> According to <http://lists.w3.org/Archives/Public/public-html/2008Jun/0358.html >>>> >, Safari 3 uses question marks. >>> >>> According to: >>> >>> http://hixie.ch/tests/adhoc/uri/encoding/017.html >>> >>> Safari trunk uses &-escaping. >> That says "Query component: raw question mark" in Safari 3.1.2. >> It says "Query component: %-escaped ASCII ☺" in nightly >> WebKit r34603. >> Looks like it changed in https://bugs.webkit.org/show_bug.cgi? >> id=15119 > > Interesting. > > I'm not sure why the Webkit guys think this is any better then what > FF does... Which, quoting <https://bugs.webkit.org/show_bug.cgi?id=15119#c1 > >: "amusingly gives the "correct" answer for Google". > > BR, Julian Given all of the different approaches on this, I would say we shouldn't feel constrained to endorse any one approach, since when we do we potentially break content targeting a different browser. That's good news since it means we can focus more on what the UAs should be doing rather than codifying broken behavior. As I've said before, I think FireFox comes the closets to the correct behavior on this, though even more unicode support would be preferred. For legacy support, perhaps we could add a accept-charset or similar attribute to the root element inherited to all descendent elements and having an implied default value of UTF-8 on the root element. This way legacy content could be repaired to work with HTMl5 UAs simply by adding the attribute where ever necessary. This attribute could then be used on any element an attribute with a URI to override the accept- charset for the URI. For non-HTML5 UAs, the will continue to use whatever disparate legacy approach they currently use, ignoring the accept-charset attribute on the root and other elements. Perhaps accept-charset is not fine-grained enough since we may also have to send different encodings to: • Host (involving DNS for the international domain name) • Path (for the server) • Query • Fragment Identifier However, most of these seem to already be handled in a decent way except for the query component. Fragment identifiers probably need to be treated in the encoding of the destination document, but then passing from the source document encoding to UTF-8 to the destination document encoding makes the most sense. For local fragment identifiers obviously source and destination documents will be the same so it collapses to a special case where the actual conversion need not take place. So by adding this attribute HTML5 can guide implementations toward universal UTF-8 support for the query URI component while still supporting legacy content and legacy application servers. Take care, Rob
Received on Sunday, 29 June 2008 18:56:59 UTC