W3C home > Mailing lists > Public > public-html@w3.org > June 2008

Re: URIs in HTML5 and issues arising

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 30 Jun 2008 06:06:06 +0000 (UTC)
To: Julian Reschke <julian.reschke@gmx.de>
Cc: uri@w3.org, HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0806300530350.13974@hixie.dreamhostps.com>

On Mon, 30 Jun 2008, Julian Reschke wrote:
> With question marks, there will be data loss. You may or may not notice 
> it, because the page you get may look ok (for instance, it depends on 
> how important that part of the query was). If you notice that something 
> is wrong, then, yes, spotting the question mark may help. If you 
> understand the issue itself. For how many users is that the case?
> With UTF-8/percent-escaping, the page may very well work as desired, 
> because the server happens to understand that encoding

There is no question that always using UTF-8 would be better than the 
current mess.

> (see Google case cited in Webkit bug report).

Do you mean the case that gets converted to &#...;? That's not UTF-8.
(If you mean something else, could you provide a link?)

> Finally, if you copy & paste the URL, you wouldn't see the replacement 
> characters anyway, right? In which case the default handling (using 
> UTF-8) would apply; which even more is a reason to consider making this 
> mandatory (because otherwise following the link inside the document and 
> the copy/paste case yield different results).

Having the encoding be essentially random is far worse than converting the 
character to a question mark, IMHO.

Anyway, the whole issue is easily avoided by authors by just using UTF-8. 
This entire problem can only be reached in invalid documents anyway.

> > > I care because I'd like to see documents using non-ASCII characters 
> > > in query parts become compliant no matter what encoding they are in.
> > 
> > Unless we change the definition of HTML5's URLs to be conforming even 
> > when those URLs would not be treated as IRIs, I don't see any way to 
> > get there from here.
> We could break the affected pages and/or add a mechanism through which 
> pages can opt-in into the sane UTF-8 based behavior.

Breaking the pages isn't an option, and an opt-in is already available: 
use UTF-8. This issue is not even remotely important enough on the grand 
scale of things to deserve special syntax or options or whatnot.

> > The HTMLWG is only a small part of the broad range of places from 
> > which I take input, which includes hundreds of blogs, at least three 
> > separate bug systems, multiple other mailing lists, face to face 
> > discussions, IRC conversations on dozens of channels and privately, 
> > private e-mails, etc. I try to keep as much of the discussions to the 
> > HTMLWG and WHATWG lists, but the sheer volume of traffic that would be 
> > generated by archiving all the sources of input on public-html would 
> > be staggering, and that's without even considering whether all those 
> > people would actually be willing to have their input forwarded in that 
> > way.
> In which case it seems to me we have a big process problem.

My goal is to get a good specification and bring the Web forward, not to 
follow process, so that's quite possible, yes. I'm certainly not going to 
start putting process ahead of getting quality feedback.

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 30 June 2008 06:06:47 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 16:25:20 UTC