- From: Frank Ellermann <nobody@xyzzy.claranet.de>
- Date: Wed, 25 Jun 2008 06:04:11 +0200
- To: uri@w3.org
Ian Hickson wrote: >> It is technically not possible to define something else without >> running into logical problems like your Safari + IE examples. > Logic sadly doesn't have much to do with the way the Web works. :-( Sure, but just because everybody does odd things in practice does not necessarily mean that this needs to be noted in a standard. If they agree on an oddity, maybe, but if not, let them do what they wish. A standard is an abstraction. Not a collection of observed behaviour divided by statistics resulting in MUST at 80%. I think that is one of the problems I have when looking into an HTML 5 draft: Some choices appear to be arbitrary, as they are not logical. > Right, that's why I was hoping we could update the URI spec. > However, you suggest above that how to handle these errorneous > addresses is an issue for the HTML spec and not the URI spec, > so I'm not sure what you are actually suggesting. Sorry, that was indeed unclear: For XHTML 1 doctypes I'd know that href= wants an RFC 2396 URI, so I'd conclude that this is old, and if they ever update it they will say STD 66. For HTML 5 you will say that href= wants an RFC 3987 IRI, but you could also say that spaces are no problem, a kind of LEIRI, for href=. You could also decide that URI is good enough, as it works everywhere, and IRI-producers would know how to get an equivalent URI in the href, while URI consumers might not know what a native IRI, let alone LEIRI, is. E.g., FF2 gets convoluted <ihost>s right, but fails or failed for the simple test of an <ipath> in an iso-8859-1 document. That is an FF2 bug, not something you want in the HTML5 spec. > "%%x" and "%xx" aren't valid escape sequences ACK, I missed the %%, and I was too lazy to check ##. Right, "#" is not permitted, only "?" and "/" are okay. > The question is what should a browser do with that document. Garbage in, garbage out. For security reasons ignoring broken URIs might be best. The example was about an http: URI, let RFC 2616 and 3986 talk about scheme specific stuff (RFC 3986 is general, but for http also specific). Or rather it was a wannabe IRI because HTML5 says so, but RFC 3987 has a normative reference to 3986 for these details. Adding its own RFC 3987 security considerations - you can of course copy what you like to emphasize in the HTML5 spec. How about this: "If an URI does not match the generic syntax in [RFC3986] it is invalid, and broken URIs can cause havoc." > The choices are to define this primarily in the *RI specs, > or to define it primarily in the HTML5 spec. Right now I'm > picking the latter URIs just have their own generic and specific specifications. Good enough, but if you know cases where you want to recommend a specific error handling... > Error handling isn't an implementation detail when 90% of the > input to the implementations are invalid, as on the Web. ...if it causes harm, and you know how to avoid it, go for it. But make sure that you don't end up with *redefining* what is and what is not a valid xyz (URI, IRI, UTF-8, XML, PNG, etc.) Frank
Received on Wednesday, 25 June 2008 04:17:37 UTC