Re: Accept-Charset support from Martin J. Duerst on 1996-12-10 (www-international@w3.org from October to December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Tue, 10 Dec 1996 16:28:23 +0100 (MET)
To: Larry Masinter <masinter@parc.xerox.com>
cc: kweide@tezcat.com, koen@win.tue.nl, www-international@w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-ID: <Pine.SUN.3.95.961210161624.245H-100000@enoshima>

On Tue, 10 Dec 1996, Larry Masinter wrote:

> # Yes, it will; but the whole point of entity header fields seems to 
> # be to have essential metainformation available without/before peeking
> # into the body.
> 
> # Attempt to define "essential":  Essential metainformation is
> # metainformation that enables a client to make decisions about what to do
> # with the content which have to be made (or should be made) before looking
> # at the content.  Examples for a Web browser: whether to render, or start
> # a file save dialog, or invoke an external viewer (and which one).
> 

> With the now freely available Bitstream unicode font and other pieces
> of technology being developed for dynamic delivery of fonts, perhaps
> we might want to view the inability to view sub-repertoires of 10646
> as limitations of particular implementations rather than an intrinsic
> property. 

> # Ok but that character (sub-)repertoire would also be useful ("essential"
> # in many cases) for non-nogotiating clients.  [Of course you may think
> # there shouldn't be any non-negotiating clients left, but that probably
> # will take a while.]
> 
> If there's no negotiation, then there's really no decision to be made:
> attempt to render the document. Expect the rendering program to fail
> gracefully.

Very well put. The HTML I18N spec does not require to render any
character with an appropriate glyph. It gives even some clues
about what "fail gracefully" might mean or not. Strictly speaking,
if a browser says Accept-Charset: ISO-8859-2, this only means
that it understands how to may these characters to Unicode. It
does not mean that it will be able to render them. It might
have a look at the document, find a lot of e.g. Chinese
numerical character references, and choose a Chinese codepage
instead of the ISO-8859-2 codepage (if it is as old-fashioned
as to need a single codepage for the whole document).
Of course, you are usually safe to assume that ISO-8859-2
can and will indeed be displayed, but the is no absolute
guarantee. The only thing the recipient (browser) should
guarantee is that the reader understands to what extent
the display is a faithful representation of the document.

> # I think charset (sub-)repertoire information should be available without
> # looking at the content.  That may be less of a concern for monolithic
> # Web browsers prevalent today.  But the protocol shouldn't be restricted
> # to that paradigm.

One big problem advocates of subrepertoire information usually don't
consider in enough detail is how to express all the different
subrepertoires. For a set of N things, there are 2**N subrepertoires.
For Unicode 2.0, N is 38,885. Even if not all of these subrepertoires
are relevant, it is very difficult to design a system that will
allow to designate all useful subrepertoires without either
sending around too much information or having to keep all this
information available (and updated every time a new subrepertoire
is defined) on each client.

Regards,	Martin.

Received on Tuesday, 10 December 1996 10:28:20 UTC