- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 26 Jun 2008 13:33:14 +0900
- To: Ian Hickson <ian@hixie.ch>
- CC: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>, uri@w3.org
Ian Hickson さんは書きました: > On Thu, 26 Jun 2008, Frank Ellermann wrote: > >>> browsers have already more or less converged on a behaviour. >>> >> But that behaviour is wrong, because it cannot work reliably, outside of >> "if it is not UTF-8 then it must be iso-8859-1, redefined to be >> windows-1252 in HTML5" scenarios. >> > > Whether it's right or wrong is neither here nor there, frankly. > > It can work reliably insofar as all user agents can do the same thing, > which is what we're aiming for in the HTML5 effort. > > > >>> Safari and Mozilla encode both as UTF-8 and %-escape both. >>> >> Sounds like they got this right, didn't they ? >> > > This was in the context of copied-and-pasted URLs, which is user > interface, for which interoperability isn't a big deal (at least not > compared to handling actual legacy content). > > > >>> It's about how to handle legacy, unmaintained, historical documents. >>> If we break them, we (humanity) lose part of our legacy. That would be >>> unfortunate. >>> >> It would be also a red herring for IRIs specified in RFC 3987 only 3.5 >> years ago, not permitted in HTML 4 or XHTML 1 pages. >> > > There are pages that aren't UTF-8 encoded that contain links with > non-ASCII characters in query components. Whether those pages existed > before or after the IRI spec did isn't really relevant. What's important > is that those pages exist and browsers don't want to break them -- and > that means that if I want my spec to not be ignored, I have to take them > into account and support them. > > > >> If we are talking about method="get" forms and corresponding IRIs with >> an <iquery> 'human legacy' is an obscure argument - but I don't see >> what's wrong with what Safari and Mozilla do. >> > > Forms are a whole different problem. It's links that are of concern here. > > > >>> Ok. HTML5 is an implementation specification. >>> >> Better split the parts where it's a document type definition for >> authors, the audience is far too different. If you tell authors what >> they can get away with they won't see the point of say "<s> is >> deprecated" vs. "interpret <s> as <del>". >> > > Yeah, that's on the cards for when the spec is more stable (we'll probably > generate two or three documents automatically for different audiences). > > > >> [IRL proposal] >> >>> I think people would be more confused by the use of the term "IRL" >>> than "URL" (with the exception of people intimiately familiar with the >>> URI spec). Maybe the term "address" would work? >>> >> If you are sure that you don't need "address" for something else it is >> fine. IE-fans would know what you are talking about. And I finally got >> used to the idea that "address" means what I know as "location". >> >> In the direction of: "An 'address' is the URI (STD 66) derived from a >> valid IRI (RFC 3987) or invalid constructs as specified below" (etc.) >> > > It was brought to my attention on IRC that "address" is probably as > overloaded as "URL" so this might not be a step forwards for the spec, > just a step sideways. I'll see what can be done though. It might be that > the spec just uses the term "URL" and ignores the URI spec's definition of > the term. There is an alternative to ignoring the URI spec's definition: describe your usage of "URL" and the usage as indented by the URI spec. See a similar problem and a solution for the usage of the terms "URI" and "IRI" mentioned at http://lists.w3.org/Archives/Public/www-tag/2008Jun/0110.html Felix > Most people seem to understand the intent, as far as I know > you're the only person whom this has confused. > > > >>>> Broken URLs have caused real damage last year: >>>> http://www.microsoft.com/technet/security/advisory/943521.mspx >>>> http://www.heise-security.co.uk/news/97878 >>>> >>> Right, that's why defining error handling is critical, and why a spec >>> that doesn't define error handling is, frankly, irresponsible. By >>> defining error handling, we help guarantee that any input results in a >>> known, predictable, and most importantly _safe_ behaviour. >>> >> IMHO you could leave this at "MUST NOT be interpreted as URI" or >> similar, but that might be a matter of taste. >> > > Well, we could say that, but then browser vendors would ignore us. I don't > want browser vendors to ignore us. > > > >> Are you going to specify the exact error handling for say surrogates and >> overlong encodings in UTF-8 ? I'd have ideas about this, but I don't >> see that it belongs into a HTML5 specificaton. >> > > These issues were brought to the attention of the Unicode consortium, who > are looking into addressing these error handling issues in their specs. > > I agree entirely that this kind of error handling stuff shouldn't be in > HTML5. The only times HTML5 defines error handling for things outside the > "HTML" language itself is when the relevant specs don't define their own > error handling, and the relevant groups refuse to do anything about it. > >
Received on Thursday, 26 June 2008 04:34:09 UTC