Re: UTF-8 Encoding Question

At 12:33 AM -0800 3/1/01, Greg Stein wrote:
>That would be a good heuristic for handling this.
>
>We could augment the DAV:href element with an optional attribute, like this:
>
>   <D:href 
>D:original-charset="iso-8859-1">http://some.host/Magn%FCs.txt</D:href>
>and
>   <D:href D:original-charset="utf-8">http://some.host/C%C3%A9sar.txt</D:href>
>
>That will help within the responses (if a server supplies it, then you don't
>have to guess; otherwise, fall back to your heuristic).

This certainly would be a helpful thing for servers to say to clients 
(although maybe in some other way), and I second your request to JimW 
about adding it as an issue.

>  We'd still need a
>way to determine the original charset of the Request-URI, though, to fully
>solve the problem.

I think there's a slightly different way to approach this, especially 
since (in general) there's no way to determine the original charset 
of the client request.

1. In situations where a returned URI in the response was present in 
the client request, I think the server really has to return it 
exactly as it was submitted.  Otherwise a lot of clients are going to 
get really confused, because for sure they're just looking for what 
they sent and not expecting to get into all sorts of "oh it's really 
the same in canonical form" discussions.

2. In situations where a returned URI in the response is not in the 
original request, the server is free to use whatever encoding it's 
going to use for path segments, and it can tell the client what is 
(either specifically as in Greg's proposal above or more generally 
via an OPTIONS setting or some such).

As both clients and servers get smarter about this issue, then 
presumably the distinction between these cases starts going away (and 
it becomes less important for servers to say what encoding they're 
using for path segments).  But on the way there this approach allows 
for backward compatibility with existing clients that are unaware of 
these issues without servers having to guess.

And it has the added advantage of not breaking any client/server 
combinations that currently work.

     dan

Received on Thursday, 1 March 2001 14:46:05 UTC