Re: cache-busting and charsets again from Martin J. Duerst on 1997-06-11 (ietf-http-wg@w3.org from April to June 1997)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Wed, 11 Jun 1997 15:58:22 +0200 (MET DST)
To: W.Sylwestrzak@icm.edu.pl
Cc: http-wg@cuckoo.hpl.hp.com
Message-Id: <Pine.SUN.3.96.970611154920.6897T-100000@enoshima>

On Wed, 11 Jun 1997, W.Sylwestrzak@icm.edu.pl wrote:

> Andrew Daviel:
> 
> > > Unfortunately most of the servers practicing this today
> > > try  to perform a 'naive' content negotiation, which effectively
> > > uses redirects to other urls. This is of course wrong,
> > > because it unnecessarily expands the url addressing space,
> > > thus making caching less effective.
> > 
> > I don't think so ... If I have A.var, which redirects to 
> > A.en.html, A.jp-jis.html, A.jp-eu.html, A.fr.html I have one

The syntax is not very fortunate. Language tags, such as "jp"
or "fr" can have several components, and these components are
separated by "-", whereas shortcuts for character encoding
("charset") variants are not part of language tags.

> > small uncacheable redirect, and 4 cacheable documents. The 4 documents
> > are all different, and have distinct URLs, so are cached independantly.
> 
> I totally agree with your example.
> 
> However I strongly feel that 'charset negotiation' should be approached
> differently than language and other stuff. Because various versions
> of the same document differing only in character encoding are
> effectively the same object and should not be cached, indexed etc.
> separately.

Yes indeed. Transforming the "charset" of a document does not
interact nicely with operations that rely on binary identity,
such as checksums, but otherwise are independent of other stuff.

> > > From the caching point of view it would be a very good practice
> > > for the clients to request/expect a single, standard charset
> > > for a given language (considered being a 'transport' charset). 
> > 
> > Nice idea; pity everyone's platform uses different coding :-(
> > (shift-jis, jis, euc-jp; koi-8, 8859-5, Windows-xxx etc etc.)
> > I think in some cases DOS, Windows, X11 and Mac are all different.
> 
> I'm not knowledgeable about non-european sets, but
> for most central-eastern European language ISO-8859-2 would be
> sufficient (and browsers for most platforms accept it) - so why
> complicating this ? But perhaps this is wrong example.

The WWW initiated the good practice to rely on a single encoding
at least for a single region, whith choosing iso-8859-1 for
western Europe. Unfortunately, this practice hasn't been followed
in other areas. In each area, there are unfortunately several
encodings by which that area is served well. And it's difficult
to decide on one. In some cases, that's not that much of a problem,
all browsers that know Japanese know how to accept any of the
three major encodings of Japanese. In other cases, it's much
worse. For example, Internet Explorer on the Mac lists iso-8859-3
for Turkish. Netscape on a Sun lists iso-8859-9. Other such examples
abound.

Regards,	Martin.

Received on Wednesday, 11 June 1997 07:02:28 UTC