- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Tue, 23 Aug 2005 18:04:49 +0200
- To: Jamie Lokier <jamie@shareable.org>
- CC: Robert Collins <robertc@robertcollins.net>, "William A. Rowe, Jr." <wrowe@rowe-clan.net>, HTTP Working Group <ietf-http-wg@w3.org>
Jamie Lokier wrote: > Those web servers _far_ predate RFC2616. Whatever guidance goes into > an HTTP URI standard, it must remain backward compatible with what's > widely deployed, which is precisely why the RFCs don't mandate it yet, > even as they suggest further work is needed on it. Right! > The Location header only has that effect in _web browsers_. Sorry? It's for instance used in 3xx (redirect) responses. > There are lots of other programs which use HTTP for which the > "characters" encoded in a URL are irrelevant. > > Increasingly, we may find that non-web-browser HTTP agents see > non-ASCII characters in parts of a document that claim to be URIs, and > must follow them. Or, they see URIs containing %-encoded characters > and need to convert those to presentable text in documents. Yes. This is a common issue in WebDAV. See <http://greenbytes.de/tech/webdav/draft-reschke-webdav-url-constraints-latest.html> (work in progress). > Broadly, the UTF-8-ness affects programs which relate documents > containing non-ASCII characters with URLs. For example, a spider > which indexes pages that happen to contain non-ASCII characters in the > URLs in "href" attributes... those are actually not valid URLs, but > the spider has to make a heuristic decision if it's to follow them. > > Unfortunately, if we mandate that non-ASCII characters found in "href" > URL attributes should be %-escaped as UTF-8 to follow them, we'll find > that this *breaks* some existing deployed sites. Maybe this is for > the best... It's ugly, but probably still the best approach. > ... Best regards, Julian
Received on Tuesday, 23 August 2005 16:05:00 UTC