- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Thu, 26 Jun 2008 00:14:25 +0200
- To: uri@w3.org
Ian Hickson wrote: > browsers have already more or less converged on a behaviour. But that behaviour is wrong, because it cannot work reliably, outside of "if it is not UTF-8 then it must be iso-8859-1, redefined to be windows-1252 in HTML5" scenarios. > Safari and Mozilla encode both as UTF-8 and %-escape both. Sounds like they got this right, didn't they ? > It's about how to handle legacy, unmaintained, historical > documents. If we break them, we (humanity) lose part of our > legacy. That would be unfortunate. It would be also a red herring for IRIs specified in RFC 3987 only 3.5 years ago, not permitted in HTML 4 or XHTML 1 pages. If we are talking about method="get" forms and corresponding IRIs with an <iquery> 'human legacy' is an obscure argument - but I don't see what's wrong with what Safari and Mozilla do. > Ok. HTML5 is an implementation specification. Better split the parts where it's a document type definition for authors, the audience is far too different. If you tell authors what they can get away with they won't see the point of say "<s> is deprecated" vs. "interpret <s> as <del>". > the HTML5 spec goes out of its way to avoid sending invalid > URIs to servers, though that may have to change depending > on what existing content depends on. It would also depend on what existing and future servers for relevant URI schemes expect, including servers implementing the various protocols - (X)HTML(5) is not the only context for URIs, and HTTP(S) is not the only URI scheme. > http://whatwg.org/html5 > That should be more usable. Yes, thanks, much better. > I believe the confusion here is that the term "URL" as used > in the HTML5 spec is intended to be a term independent of > the term "URL" as used in the URI spec. +1 [IRL proposal] > I think people would be more confused by the use of the term > "IRL" than "URL" (with the exception of people intimiately > familiar with the URI spec). Maybe the term "address" would > work? If you are sure that you don't need "address" for something else it is fine. IE-fans would know what you are talking about. And I finally got used to the idea that "address" means what I know as "location". In the direction of: "An 'address' is the URI (STD 66) derived from a valid IRI (RFC 3987) or invalid constructs as specified below" (etc.) >> Broken URLs have caused real damage last year: >> http://www.microsoft.com/technet/security/advisory/943521.mspx >> http://www.heise-security.co.uk/news/97878 > Right, that's why defining error handling is critical, and why > a spec that doesn't define error handling is, frankly, > irresponsible. By defining error handling, we help guarantee > that any input results in a known, predictable, and most > importantly _safe_ behaviour. IMHO you could leave this at "MUST NOT be interpreted as URI" or similar, but that might be a matter of taste. Are you going to specify the exact error handling for say surrogates and overlong encodings in UTF-8 ? I'd have ideas about this, but I don't see that it belongs into a HTML5 specificaton. Frank
Received on Wednesday, 25 June 2008 22:13:32 UTC