RE: [dav-dev] Problem with OfficeXP and Accented characters

Lisa,

I think the only way we'll ever achieve interoperability for non-ASCII
characters in resource names is to select one specific character encoding
that needs to be used to map the characters to octets (which in turn then
will get percent-escaped in the URL). Note that recipients of URLs have no
way to find out which encoding was used, so we really need consensus here.

The big issue is that the current RFCs do not say anything normative about
the issue. Therefore, in practice URLs do exists which -- after
percent-de-escaping -- can not be decoded as UTF-8, and these URLs are fully
legal URLs according to RFC2396.

So my proposal is to add a SHOULD level requirement for generating URL path
segments from names that do contain non-ASCII characters, such as filenames
in a file system:

RFC2396bis currently says [1]:

"When a URI scheme defines a component that represents textual data
consisting of characters from the Unicode (ISO 10646) character set, we
recommend that the data be encoded first as octets according to the UTF-8
[UTF-8] character encoding, and then escaping only those octets that are not
in the unreserved character set."

So we'd be in sync with the latest URI draft, accept we would make it a
SHOULD for WebDAV.




[1]
<http://cvs.apache.org/viewcvs.cgi/*checkout*/ietf-uri/rev-2002/rfc2396bis.h
tml#encoding>

--
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

> -----Original Message-----
> From: Lisa Dusseault [mailto:lisa@xythos.com]
> Sent: Friday, June 27, 2003 11:59 PM
> To: 'Julian Reschke'; 'Greg Stein'
> Cc: 'Peter Gillis'; dav-dev@lyra.org; w3c-dist-auth@w3.org;
> I20568n@mindshare.intraspect.com
> Subject: RE: [dav-dev] Problem with OfficeXP and Accented characters
>
>
> Does somebody have a specific proposal what language should appear in
> 2518bis for this issue?  I haven't dealt with it even though it's so old
> because I'm having trouble figuring out what to say!
>
> lisa
>
> > -----Original Message-----
> > From: w3c-dist-auth-request@w3.org
> > [mailto:w3c-dist-auth-request@w3.org] On Behalf Of Julian Reschke
> > Sent: Wednesday, February 20, 2002 2:38 AM
> > To: Greg Stein
> > Cc: Peter Gillis; dav-dev@lyra.org; w3c-dist-auth@w3.org;
> > I20568n@mindshare.intraspect.com
> > Subject: RE: [dav-dev] Problem with OfficeXP and Accented characters
> >
> >
> > Greg,
> >
> > what you describe is a scenario where
> >
> > - the server preserves whatever octet sequence was used in a URL
> > - the clients are consistent in what they send/expect
> >
> > However, thinks are more complicated.
> >
> > - In moddav (correct me if I'm wrong), people may be creating
> > resources by directly accessing the filesystem. If, for
> > instance, I create a filename containing "A umlaut", and the
> > filesystem's filename encoding is ISO-8859-1, I'll have the
> > octet %c4 in the name. Returning %c4 in the PROPFIND response
> > in the general case will not work (because the client doesn't
> > know about URL character encodings, so it must use what the
> > RFCs say, and that is UTF-8).
> >
> > - Broken clients may use request/destination URIs which are
> > not encoded in UTF-8 (this may apply to older versions of
> > Microsoft clients). So you basically have the choice of
> > failing the request if the octet stream doesn't UTF-8-decode,
> > or you can try to workaround the problem by making
> > assumptions about what the encoding in the request may have been.
> >
> > As this issue is coming up every few weeks and as almost
> > every server / client I've seen in the last few months has
> > some bug in this regard, it should certainly clarified in the
> > RFC2518 revision.
> >
> > Julian
> >
> > > -----Original Message-----
> > > From: w3c-dist-auth-request@w3.org
> > > [mailto:w3c-dist-auth-request@w3.org]On Behalf Of Greg Stein
> > > Sent: Wednesday, February 20, 2002 3:53 AM
> > > To: Julian Reschke
> > > Cc: Peter Gillis; dav-dev@lyra.org; w3c-dist-auth@w3.org;
> > > I20568n@mindshare.intraspect.com
> > > Subject: Re: [dav-dev] Problem with OfficeXP and Accented characters
> > >
> > >
> > > On Wed, Feb 13, 2002 at 09:31:37PM +0100, Julian Reschke wrote:
> > > >...
> > > > > From: dav-dev-admin@lyra.org
> > > [mailto:dav-dev-admin@lyra.org]On Behalf Of
> > > > > Peter Gillis
> > > > > Sent: Wednesday, February 13, 2002 8:46 PM
> > > >...
> > > > > I am having a problem with Microsoft Web Folders after the
> > > client installs
> > > > > Office XP on their machine and file names with accented
> > characters
> > > > > are involved.  Our server has been working in the past with
> > > > > earlier implementations of Web Folders without a
> > problem, however,
> > > when the client
> > > > > machine is upgraded to Office XP, a document that was
> > accessible
> > > > > previously is now not found.  I have tracked the
> > problem down to
> > > > > the
> > > fact that Web
> > > > > Folders is encoding the URI value a second time before
> > sending the
> > > > > request, which then causes it not to be found on our
> > server.  For
> > > > > example:
> > > > >
> > > > > In the folder listing we send back the following property:
> > > > >
> > > > > <href>/dav/webdav/%E8%E2temp%C9.xls</href>
> > > >
> > > > Isn't this wrong in the first place? My understanding is that you
> > > > should send URL-encoded UTF-8.
> > >
> > > Nope.
> > >
> > > The filename is in an "original character set". That is
> > then encoded
> > > into octets for the URL. That transformation is not specified
> > > anywhere. Ideally, it is "original -> UTF-8", but nobody
> > says it must
> > > be. In fact, I would say
> > > it should match whatever encoding was used for the Request-URI
> > > (but that is
> > > not specified/defined in the request, so you're out of luck again).
> > >
> > > Once you have octets, then you perform the URL-escaping
> > (using '%xx').
> > >
> > > After that, you need to transform the URL into the character set of
> > > the response body. That is usually UTF-8, but it is
> > possible to have
> > > the XML in a different character set (provided it is
> > specified on the
> > > Content-Type response header).
> > >
> > > Finally, you must XML-escape the UTF-8 characters of the URL you're
> > > inserting (e.g. translate '&' to '&amp;') so that you can embed the
> > > UTF-8 content into the XML response.
> > >
> > >
> > > A long time ago, I captured this as a start of a technical FAQ. See:
> > >     http://www.webdav.org/other/techfaq.html
> > >
> > > Specifically, the second section.
> > >
> > > Cheers,
> > > -g
> > >
> > > --
> > > Greg Stein, http://www.lyra.org/
> > >
> >
>
>

Received on Saturday, 28 June 2003 05:51:43 UTC