W3C home > Mailing lists > Public > w3c-dist-auth@w3.org > January to March 2002

Re: [dav-dev] Problem with OfficeXP and Accented characters

From: Stefan Eissing <stefan.eissing@greenbytes.de>
Date: Fri, 22 Feb 2002 10:12:29 +0100
Cc: dav-dev@lyra.org, w3c-dist-auth@w3.org, Julian Reschke <julian.reschke@gmx.de>
To: Erik Seaberg <erk@flyingcroc.com>
Message-Id: <4BE05E01-2774-11D6-BF5C-00039384827E@greenbytes.de>

Am Donnerstag den, 21. Februar 2002, um 22:50, schrieb Erik Seaberg:

> Julian Reschke <julian.reschke@gmx.de> writes:
> [...]
>> - Broken clients may use request/destination URIs which are not
>> encoded in UTF-8
>
> I thought (per RFCs 2396 and 2616) the "http" scheme just specifies a
> mapping for each part of the URL from US-ASCII to a sequence of
> octets, and the mapping from those octets back to characters (if any!)
> is at the whim of the server.  %-escaped UTF-8 is *recommended* for
> new schemes (and for HTML that embeds URLs containing invalid
> non-ASCII characters), but a server that uses Latin-15 should be able
> to send Latin-15 in PROPFIND responses without having a client
> complain about or mangle perfectly workable URLs just because it can't
> display them nicely.

What you say is correct from a pure protocol perspective. Nowhere
in RFC 2518 or 2616 is anything said about decoded URIs. And
RFC 2396 says that there is no such thing as a decoded URI.

All true. The problem appears when you put a user interface on top
of webdav. Be it a Windows WebFolder or an OS X mounted volume.
People just expect nowadays that they can use Umlauts and the Euro
sign in filenames. They even expect that they can do that in Windows
and Linux and MacOS and will all see the same file on a particular
server.

As I see it, this requires a standard character encoding for conversion
of user input to URIs on the client side, namely UTF-8.

For the broken clients, Julian is talking about, it would be good
if servers make the effort to detect non-UTF-8 encoding and
apply characters conversion (most likely from 8859-1).

I've seen Tomcat looking for charset parameters in URIs to detect
char encoding. I'm not sure if I really understand what they are
trying to do. It looks like clients could use
   GET /test/folder/%d7%c4;charset=ISO-8859-7 HTTP/1.1
in their requests. Does anyone know more about that?

//Stefan
Received on Friday, 22 February 2002 04:13:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 18:43:59 GMT