- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 25 Jul 2001 19:03:55 +0200
- To: www-validator@w3.org
Hi, Several Technical Reports define how non-ASCII characters in URIs should be handled, this is convert the non-ASCII characters to UTF-8 and apply the URI encoding to it. Additionally HTML 4 suggests: [...] Note. Some older user agents trivially process URIs in HTML using the bytes of the character encoding in which the document was received. Some older HTML documents rely on this practice and break when transcoded. User agents that want to handle these older documents should, on receiving a URI containing characters outside the legal set, first use the conversion based on UTF-8. Only if the resulting URI does not resolve should they try constructing a URI based on the bytes of the character encoding in which the document was received. [...] While the Validator already does this [1] (if the charset parameter with charset=utf-8 will be added to the HTTP header), I can't see this issue addressed in the checklink script. I suggest to implement what HTML 4 recommends. I'd provide a patch, but I'm currently not that familiar with it... Both, the checklink script and the validator should warn the user if they encounter improperly escaped URIs. [1] I strongly recommend that the URI package 1.15 is installed on the production server. It conforms to RFC 2732 (see my request and discussion in mid may on the libwww@perl.org mailing list) and current reports require compliance for that. -- Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de 25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/
Received on Wednesday, 25 July 2001 13:04:38 UTC