Re: location uri, ucs and the http scheme definition.

Jamie Lokier wrote:
> Those web servers _far_ predate RFC2616.  Whatever guidance goes into
> an HTTP URI standard, it must remain backward compatible with what's
> widely deployed, which is precisely why the RFCs don't mandate it yet,
> even as they suggest further work is needed on it.

Right!

> The Location header only has that effect in _web browsers_.

Sorry? It's for instance used in 3xx (redirect) responses.

> There are lots of other programs which use HTTP for which the
> "characters" encoded in a URL are irrelevant.
> 
> Increasingly, we may find that non-web-browser HTTP agents see
> non-ASCII characters in parts of a document that claim to be URIs, and
> must follow them.  Or, they see URIs containing %-encoded characters
> and need to convert those to presentable text in documents.

Yes. This is a common issue in WebDAV. See 
<http://greenbytes.de/tech/webdav/draft-reschke-webdav-url-constraints-latest.html> 
(work in progress).

> Broadly, the UTF-8-ness affects programs which relate documents
> containing non-ASCII characters with URLs.  For example, a spider
> which indexes pages that happen to contain non-ASCII characters in the
> URLs in "href" attributes... those are actually not valid URLs, but
> the spider has to make a heuristic decision if it's to follow them.
> 
> Unfortunately, if we mandate that non-ASCII characters found in "href"
> URL attributes should be %-escaped as UTF-8 to follow them, we'll find
> that this *breaks* some existing deployed sites.  Maybe this is for
> the best...

It's ugly, but probably still the best approach.

> ...

Best regards, Julian

Received on Tuesday, 23 August 2005 16:05:00 UTC