- From: <lee@sq.com>
- Date: Mon, 12 Aug 96 21:38:52 EDT
- To: www-font@w3.org, hoefler@typography.com
> Also, I don't think it's mathematically possible to dissect even the most > rudimentary English into enough digraphs to no longer require the > presence of discrete glyphs. It is possible in principle, but in practice unworkable: Suppose there are no more than 300 characters. Or, say we are red-necked American Ascii-users :-) and have only 96 visible characters. There are 96 times 96 combinations, giving 9408 glyphs. Note that `w' (for example) occurs 96 times as the 1st in a pair, and 96 times as the 2nd, so is transmitted 192 times instead of once in the very worse case. Now, it turns out that the top 30 to 40 digrams are by far the most common in English. I just did a count on a small (50 MBytes) database of magazine articles that has a vocabulary of only about 60,000 words, and found that there were approximately 1,281 digraphs used. Different material has different results -- with a single web page, for example http://www.sq.com/, there were 93 digraphs. That's still a lot more to transmit than the actual document. Of course, this assumes that one of the characters in the digraph is `space'. If you discount that, then you can't represent the word "boy", because it needs either (bo)y or b(oy), so you have to have a single glyph. If you allow trigraphs as well as digraphs, you can do it, but "a boy or 3 boys" involves sending (a) as a glyph, (boy) as a logotype, and (bo)(ys) as two digraphs -- clearly inefficient. As J.H. pointed out, a document like a b c d e f and so on requires the whole normal character set anyway. There is some merit in using logotypes for "the" and "and", because they are so common, bt it'd have to be a fairly large document before you saved very much space that way. Note also that what I was suggesting with the encodings was a way of making the font moderately secure on its way to the printer -- the actual web page would still be in ISO 8859-1 (Latin 1) as now. Strictly speaking it'd be possible to write software to reverse-engineer given the plain text document and the `encrypted' version. But it would also be possible to write a `malicious' web browser that saved unencrypted fonts. All you can hope to do is be secure against most people, not all people. The FUSE `loss of faith' font (I think that's the one) is an example of what you can do like this... Lee
Received on Monday, 12 August 1996 21:39:38 UTC