Re: Accept-Charset support from Chris Lilley on 1996-12-04 (www-international@w3.org from October to December 1996)

From: Chris Lilley <Chris.Lilley@sophia.inria.fr>
Date: Wed, 4 Dec 1996 22:44:17 +0100 (MET)
To: erik@netscape.com, Chris Lilley <Chris.Lilley@sophia.inria.fr>
Cc: Alan Barrett/DUB/Lotus <Alan_Barrett/DUB/Lotus.LOTUSINT@crd.lotus.com>, www-international <www-international@w3.org>, bobj <bobj@netscape.com>, wjs <wjs@netscape.com>, Ed Batutis/CAM /Lotus <Ed_Batutis/CAM/Lotus@crd.lotus.com>
Message-Id: <9612042244.ZM16084@grommit.inria.fr>

On Dec 4,  1:19pm, Erik van der Poel wrote:
> Chris wrote:
> > Erik wrote:
> > > How about using a more compact representation of Accept-Charset. E.g.
> > > bit masks corresponding to the number in the charset registry.
> >
> > Do they have canonical numbers?
>
> Yes. Here is an excerpt from the registry:
>
> #The value space for MIBenum values has been divided into three
> #regions. The first region (3-999) consists of coded character sets
> #that have been standardized by some standard setting organization.
> #This region is intended for standards that do not have subset
> #implementations. The second region (1000-1999) is for the Unicode and
> #ISO/IEC 10646 coded character sets together with a specification of a
> #(set of) sub-repetoires that may occur.  The third region (>1999) is
> #intended for vendor specific coded character sets.

Hmm, pity they chose such big numbers. I had hoped there would be a lot
less than 255 charsets, for example if there had been 94 or less these
could have been indicated with 33+number, ie 1 for the first one, " for
the second ... up to } for the 93rd and ~ for the 94th.

Even taking just the first set (3-999) that is still 125 bytes when
expressed as a bitmask.

> > > This
> > > would omit the "q" parameter, but I'm not sure this is needed in the
> > > Accept-Charset case anyway.
> >
> > Since this would be a new representation, one could also add a
> > requirement that the charsets in such a binary representation are
> > sorted in decreasing order of q.
>
> Do we really need the "q" in the Accept-Charset case? What does it mean?

Well, a charset with a high q factor would be preferred, for example it
might be the native charset on the browsers platform and require no fancy
processing. A low q factor would indicate, I can accept this if it's all
you have got, but then I need to futz with all those escape sequences and
convert the document to another form internally.

-- 
Chris Lilley, W3C                          [ http://www.w3.org/ ]
Graphics and Fonts Guy            The World Wide Web Consortium
http://www.w3.org/people/chris/              INRIA,  Projet W3C
chris@w3.org                       2004 Rt des Lucioles / BP 93
+33 (0)4 93 65 79 87       06902 Sophia Antipolis Cedex, France

Received on Wednesday, 4 December 1996 16:46:17 UTC