Re: [css3-lists] glyphs in single string from Peter Moulder on 2011-11-23 (www-style@w3.org from November 2011)

From: Peter Moulder <peter.moulder@monash.edu>
Date: Wed, 23 Nov 2011 19:49:04 +1100
To: www-style@w3.org
Message-id: <20111123084904.GA16590@bowman.infotech.monash.edu.au>
On Tue, Nov 22, 2011 at 08:25:55AM -0800, Tab Atkins Jr. wrote:
> On Tue, Nov 22, 2011 at 1:18 AM, Peter Moulder <peter.moulder@monash.edu> wrote:
> > On Mon, Nov 21, 2011 at 01:39:36PM +0100, Håkon Wium Lie wrote:
> >
> >>  - Issue: Is it possible to find a syntax for several list markers to be
> >>    written in one string? One possible solution is:
> >>
> >>     @counter-style lower-norwegian {
> >>       type: alphabetic;
> >>       glyphs: 'abcdefghijklmnopqrstuvwxyzæøå';
> >>     }
> >
> > With the above, the specification of where one entry ends and the next begins
> > is important, particularly considering characters in decomposed form.
> >
> > Would introducing space separation 'a b c ...' be acceptable, or do we
> > want to try specifying some other division of a string into entries?
> >
> > Spacing, while longer, would at least allow easy expansion to the odd
> > multi-character entry like Greek "στ", and might give less surprising
> > results for borderline cases like Αι or ij that are sometimes considered
> > a single character.

As to how often such cases arise in practice: The situation sometimes
comes up in languages that have several scripts, or that have changed
from one script to another.  An example of this currently in the
css3-lists spec is Oromo when written in Qubee (latin) script, which
has counters aa, ee, ii, oo, uu, ch, dh, kh, ny, ph, sh.

The persion-abjad counter style provides a different sort of example,
where one of the counters is terminated by U+200d zwj.

http://en.wikipedia.org/wiki/Digraph_%28orthography%29 mentions the
existence of languages that have digraphs as part of their alphabets
(giving the example of Czech ch, along with some examples that probably
don't count for our purposes), though I don't know how many languages
use such digraphs in list/chapter/table/... numbering.

It's discomforting that any errors of this kind can easily go unnoticed
by the stylesheet author if they occur after the first few items.
On the other hand, one could counter that at least the error won't often
get rendered if it occurs after the first few items.  (I can't say that
this counter-argument makes me feel much better, but it still has some
weight.)

I was expecting the spaced option to be the most readable of the
three options, because each item value is cleanly separated both from
other items and from quotation marks.  However, for alphabetic and
numeric counter styles, this shouldn't usually be a problem, and
in fact it can be more legible without the space, as was the case for
the lower-norwegian example below.

Reading is probably much more important than writing for counter-style
declarations.  In typing difficulty, the spaced version is between
the two.  If the counter style is an alphabetic one for the script that
one usually types in, and each each item is a single keystroke, then I'm
surprised to find that adding spaces is more than twice as hard as the
spaceless version.  Typing the full syntax also has a surprise for the
alphabet case: typing the three characters "' '" between item values
is mentally much less comfortable than either of the shorthand options,
perhaps because it's much more of a distraction to thinking what the next
letter is.  These differences will be much less noticeable for
non-alphabetic counter styles.


Regarding the criticism of space separation that it makes it almost like
the full syntax: on one hand it does double the length (in monospaced
font, when written without using hex escapes) of the written value
compared to the spaceless option, but on the other hand it halves the
length compared to the full syntax.  For a sequences of around 26 items,
either spaced or spaceless options are likely to fit comfortably in a
single line (even with an indent of 8 and a long keyword), while the full
syntax probably won't fit in a line.


Let's try a couple of common uses of @counter-style in all three options
to get a feel for how they differ visually:

@counter-style lower-norwegian {
  type: alphabetic;
  glyphs: 'abcdefghijklmnopqrstuvwxyzæøå';
}

@counter-style lower-norwegian {
  type: alphabetic;
  glyphs: 'a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å';
}

@counter-style lower-norwegian {
  type: alphabetic;
  glyphs: 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o'
          'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z' 'æ' 'ø' 'å';
}

@counter-style lower-norwegian {
  type: alphabetic;
  glyphs: 'a'
          'b'
          'c'
          'd'
          'e'
          'f'
          'g'
          'h'
          'i'
          'j'
          'k'
          'l'
          'm'
          'n'
          'o'
          'p'
          'q'
          'r'
          's'
          't'
          'u'
          'v'
          'w'
          'x'
          'y'
          'z'
          'æ'
          'ø'
          'å' ;
}


@counter-style daggers {
  type: symbolic;
  glyphs: '*†‡§‖¶';
}

@counter-style daggers {
  type: symbolic;
  glyphs: '* † ‡ § ‖ ¶';
}

@counter-style daggers {
  type: symbolic;
  glyphs: '*' '†' '‡' '§' '‖' '¶';
}


@counter-style daggers {
  type: symbolic;
  glyphs: '\2020\2021\a7\2016\b6';
}

@counter-style daggers {
  type: symbolic;
  glyphs: '\2020 \2021 \a7 \2016 \b6';
}

@counter-style daggers {
  type: symbolic;
  glyphs: '\2020' '\2021' '\a7' '\2016' '\b6';
}

Those examples are interesting in that the lower-norwegian example is
actually easier to read in unspaced form than spaced: the spaces make it
easier to lose one's place.  Both full syntax versions are actually
pretty good for reading, whereas I was expecting the quotes to get in the
way more.

Whereas for the dagger example, the spaced version is easiest to read,
as I'd have expected.

For hex escapes, I find the spaced shorthand easiest to read, though the
full syntax is also quite good for hex escapes if we did go with the
unspaced option.


> > A vaguely related issue with the above syntax is distinguishing between
> > glyphs:'abc...' and glyphs:'•'.  Should one of the keywords be changed,
> > say to glyphs-string?  Or do we want to guess based on length of the
> > string (or presence of spaces) ?
> 
> Guessing is definitely out.  In the issue, I presented it with a
> disambiguating keyword.

"Guess" was perhaps an unfair choice of word on my part.  If the
criterion is written in the specification, then in an absolute sense
it isn't really a guess any more than any other part of the syntax.

On the other hand, on a quantitative level we can ask how likely a given
choice of syntax is to behave differently from what a writer or reader
intends or expects.

One could similarly comment that "grapheme cluster" may be
programatically unambiguous (once one has distinguished between
"legacy grapheme cluster" and "extended grapheme cluster", and linked to
the ~9 pages in the relevant Unicode annexe that define these terms);
but when considering a human writer or reader of a stylesheet, one might
talk of language choices such as "grapheme cluster boundary" for
item separator in terms of how good a "guess" it is as to matching
author expectations, or how often it would "mis-guess".

In both of the above questions, my own tendancy is to prefer the option
that's longer but harder to misunderstand (which showed through in my use
of the word "guess" in the previous message); but I suspect that that
preferenence comes from my background in software development, where
reliability concerns differ considerably from in stylesheet development.

pjrm.
Received on Wednesday, 23 November 2011 08:49:35 UTC