- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Tue, 10 Dec 1996 16:28:23 +0100 (MET)
- To: Larry Masinter <masinter@parc.xerox.com>
- cc: kweide@tezcat.com, koen@win.tue.nl, www-international@w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
On Tue, 10 Dec 1996, Larry Masinter wrote: > # Yes, it will; but the whole point of entity header fields seems to > # be to have essential metainformation available without/before peeking > # into the body. > > # Attempt to define "essential": Essential metainformation is > # metainformation that enables a client to make decisions about what to do > # with the content which have to be made (or should be made) before looking > # at the content. Examples for a Web browser: whether to render, or start > # a file save dialog, or invoke an external viewer (and which one). > > With the now freely available Bitstream unicode font and other pieces > of technology being developed for dynamic delivery of fonts, perhaps > we might want to view the inability to view sub-repertoires of 10646 > as limitations of particular implementations rather than an intrinsic > property. > # Ok but that character (sub-)repertoire would also be useful ("essential" > # in many cases) for non-nogotiating clients. [Of course you may think > # there shouldn't be any non-negotiating clients left, but that probably > # will take a while.] > > If there's no negotiation, then there's really no decision to be made: > attempt to render the document. Expect the rendering program to fail > gracefully. Very well put. The HTML I18N spec does not require to render any character with an appropriate glyph. It gives even some clues about what "fail gracefully" might mean or not. Strictly speaking, if a browser says Accept-Charset: ISO-8859-2, this only means that it understands how to may these characters to Unicode. It does not mean that it will be able to render them. It might have a look at the document, find a lot of e.g. Chinese numerical character references, and choose a Chinese codepage instead of the ISO-8859-2 codepage (if it is as old-fashioned as to need a single codepage for the whole document). Of course, you are usually safe to assume that ISO-8859-2 can and will indeed be displayed, but the is no absolute guarantee. The only thing the recipient (browser) should guarantee is that the reader understands to what extent the display is a faithful representation of the document. > # I think charset (sub-)repertoire information should be available without > # looking at the content. That may be less of a concern for monolithic > # Web browsers prevalent today. But the protocol shouldn't be restricted > # to that paradigm. One big problem advocates of subrepertoire information usually don't consider in enough detail is how to express all the different subrepertoires. For a set of N things, there are 2**N subrepertoires. For Unicode 2.0, N is 38,885. Even if not all of these subrepertoires are relevant, it is very difficult to design a system that will allow to designate all useful subrepertoires without either sending around too much information or having to keep all this information available (and updated every time a new subrepertoire is defined) on each client. Regards, Martin.
Received on Tuesday, 10 December 1996 10:28:20 UTC