Re: URIs in HTML5 and issues arising from Ian Hickson on 2008-06-30 (uri@w3.org from June 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 30 Jun 2008 06:06:06 +0000 (UTC)
To: Julian Reschke <julian.reschke@gmx.de>
Cc: uri@w3.org, HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0806300530350.13974@hixie.dreamhostps.com>

On Mon, 30 Jun 2008, Julian Reschke wrote:
> 
> With question marks, there will be data loss. You may or may not notice 
> it, because the page you get may look ok (for instance, it depends on 
> how important that part of the query was). If you notice that something 
> is wrong, then, yes, spotting the question mark may help. If you 
> understand the issue itself. For how many users is that the case?
> 
> With UTF-8/percent-escaping, the page may very well work as desired, 
> because the server happens to understand that encoding

There is no question that always using UTF-8 would be better than the 
current mess.


> (see Google case cited in Webkit bug report).

Do you mean the case that gets converted to &#...;? That's not UTF-8.
(If you mean something else, could you provide a link?)


> Finally, if you copy & paste the URL, you wouldn't see the replacement 
> characters anyway, right? In which case the default handling (using 
> UTF-8) would apply; which even more is a reason to consider making this 
> mandatory (because otherwise following the link inside the document and 
> the copy/paste case yield different results).

Having the encoding be essentially random is far worse than converting the 
character to a question mark, IMHO.

Anyway, the whole issue is easily avoided by authors by just using UTF-8. 
This entire problem can only be reached in invalid documents anyway.


> > > I care because I'd like to see documents using non-ASCII characters 
> > > in query parts become compliant no matter what encoding they are in.
> > 
> > Unless we change the definition of HTML5's URLs to be conforming even 
> > when those URLs would not be treated as IRIs, I don't see any way to 
> > get there from here.
> 
> We could break the affected pages and/or add a mechanism through which 
> pages can opt-in into the sane UTF-8 based behavior.

Breaking the pages isn't an option, and an opt-in is already available: 
use UTF-8. This issue is not even remotely important enough on the grand 
scale of things to deserve special syntax or options or whatnot.


> > The HTMLWG is only a small part of the broad range of places from 
> > which I take input, which includes hundreds of blogs, at least three 
> > separate bug systems, multiple other mailing lists, face to face 
> > discussions, IRC conversations on dozens of channels and privately, 
> > private e-mails, etc. I try to keep as much of the discussions to the 
> > HTMLWG and WHATWG lists, but the sheer volume of traffic that would be 
> > generated by archiving all the sources of input on public-html would 
> > be staggering, and that's without even considering whether all those 
> > people would actually be willing to have their input forwarded in that 
> > way.
> 
> In which case it seems to me we have a big process problem.

My goal is to get a good specification and bring the Web forward, not to 
follow process, so that's quite possible, yes. I'm certainly not going to 
start putting process ahead of getting quality feedback.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 30 June 2008 06:06:46 UTC