Fonts and character matching from Ernest Cline on 2003-04-08 (www-style@w3.org from April 2003)

From: Ernest Cline <ernestcline@mindspring.com>
Date: Tue, 08 Apr 2003 15:06:37 -0400
To: www-style@w3.org
Message-ID: <3E92E57D.24683.1221B37@localhost>
Right now, the standard calls for determining whether a given piece of 
text can be rendered in a given font on a per character basis. This is 
a reasonable default behavior, but there are times when another 
behavior might be desired.

The following example is very contrived so that I only have to write 
one character reference in my example, but suppose that we have the  
following HTML fragment:
  <q>Come on, Tonto.<br>Hi&#x2010;ho, Silver!<br>Away&#x203C;</q>
where the calculated style rule for font-family is:
  HexCalc, FancyCap, LatinaUno, Unicodia
Now suppose HexCalc only contains glyphs for 0-9, a-f, and A-F,
Fancy Cap contains A-Z and various punctiation characters including
' ', '.', '!', '-', '!!', and the quote marks,
LatinaUno has glyphs only for the Latin1 character set,
and Unicodia has glyphs for everything in Unicode 3.2.
The current rules call for different parts of the text to be rendered 
in three different fonts despite the presence of a fourth font that 
could display all of the characters.

I think that a property that would enable font selection to be done on 
the basis of something other than individual characters would be of 
use. Here is that proposed property along with proposed values and a 
description of what they do.

font-match:
  A space separated set of values. If matching cannot be done by the 
method sepecified by the first value, then the user agent tries 
according to the next value.  If none of the methods listed work, then 
matching is to done as if 'character' were the value. That method is 
guarenteed to always work and is the current behavior.

While not required, it is recommended that if multiple values are 
given, that they be listed in the order: auto, all, element, line, 
word, character.  This is because if a given method fails to work, all 
of the methods that precede it in that ordering will also fail.

  auto
    This is the default behavior in my proposal. This was chosen in 
stead of 'character' so as to facilitate the usage of the 'all' value.  
Since if font-match is not used by the CSS of a given document this 
results in the current behavior 
    Unless the parent element has a value of font-match of 'auto' or 
'all' this method is considered to fail and the next should be tried. 
    If the parent element has a value of font-match of 'all' then its 
content is part of the set of characters that must be matched.  If that 
'all' fails for the parent then the 'auto' method for this element also 
fails, and the next value for this element should be tried.
    If the parent element has a value of font-match of 'auto' then its 
content is considered part of its parent for the purpose of font 
matching.  If 'auto' in the parent fails then this element acts as if 
the next value in its parent had always been the value.

  all
    All of the characters in the content of the current element and of 
all descendants that have a value of font-match of 'auto' with no 
closer ancestor of that descendant having a value of font-match other 
than 'auto' must be able to be displayed in the same font.  If not, 
this method fails.

  element
    Matching is done on a per element basis.  This is similar to 'all' 
except that even if a child element has a value of font-match of 'auto' 
its contents shall not be considered part of this element for the 
purpose of font matching.

  line
    Matching is done on a per line basis.  User agents MAY consider a 
line either to be a displayed line or a hardcoded line.  In the example 
above, '"Come on, Tonto.' would be rendered in LatinaUno, 'Away!!" 
would be rendered in Unicodia.  'Hi-ho, Silver!"' could always be 
rendered in Unicodia, however if a user agent placed 'Hi-ho,' and 
'Silver!"' on different lines, if could, but would not be required to 
render 'Silver!' in LatinaUno instead.

(Question: Should this value be changed so that the basis is always one 
of the two choices of what is a line instead of at the discretion of 
the user agent?)
(Question: If some but not all lines can find a font-family that works, 
should those lines use that font-family or if any line cannot do this, 
should none of them do so?)

  word
    Matching is done for individual whitespace characters and for 
groups of characters separated by whitespace.  In the example above 
this would cause the spaces to be rendered in FancyCap, 'Come', 'on,', 
'Tonto.', and 'Silver!' to be rendered in LatinaUno, and 'Hi-ho' and 
'Away!!' to be rendered in Unicodia.

(Question: If some but not all words can find a font-family that works, 
should those lines use that font-family or if any word cannot do this, 
should none of them do so?)

  character
    This is the current behavior. It always works. Because of the 
failure method used this method need not be given as part of a list of 
values, but must be used if the default value 'auto' is not desired.


Here is another example to demonstrate what the values 'all', 'auto' 
and 'element' do in more detail:

  <e1 style="font-match: all">
    <e2 style="font-match: auto all">
      <e3 style="font-match: auto word"></e3>
      <e4></e4>
      <e5 style="font-match:character"></e5>
    </e2>
    <e6 style="font-match:all line word">
      <e7 style="font-match:auto element">
        <e8 style="font-match: auto line"></e8>
      </e7>
    </e6>
  </e1>

When <e1> tries to font-match it tries to find a font-family that can 
display all content that is directly contained in <e1>, <e2>, <e3> and 
<e4>. It does not try to match <e5> and <e6>'s content because they 
have a value other than 'auto' for font-match. It does not try to match 
<e7> or <e8> because they they have an intervening ancestor (<e6>) that 
does not have a value of 'auto' for font match. If <e1> can find no 
font-family that works for all the characters of the content of <e1>, 
<e2>, <e3> and <e4> then it matches its own content to a font-family on 
a per character basis and leaves its children to do their font matching 
on their own.

If <e2> must try to font match it tries to match all directly contained 
content in <e2>, <e3>, <e4>.

<e3> will try font matching using the 'word' method only if <e1> and 
<e2> are both unable to get a match for all of their content.

While <e6> will consider the content of both <e7> and its child element 
<e8> when trying to match, if <e6> should fail to match all the 
content, <e7> will not consider the content of <e8> when trying to 
match.


Font matching can be a computationally expensive process and delay 
rendering. If a user wishes to turn off complex font matching and use 
the current behavior they need only include the following rule in their 
user stylesheet:
  * {font-match:character !important}

Authors are cautioned against using the 'all' method unless necessary 
for this 
same reason. In particular, elements with computed widths may have 
delays in 
rendering caused by 
having to wait to find out what font will be used.
Received on Tuesday, 8 April 2003 15:06:23 UTC