- From: Peter Moulder <peter.moulder@monash.edu>
- Date: Wed, 23 Nov 2011 19:49:04 +1100
- To: www-style@w3.org
On Tue, Nov 22, 2011 at 08:25:55AM -0800, Tab Atkins Jr. wrote: > On Tue, Nov 22, 2011 at 1:18 AM, Peter Moulder <peter.moulder@monash.edu> wrote: > > On Mon, Nov 21, 2011 at 01:39:36PM +0100, Håkon Wium Lie wrote: > > > >> - Issue: Is it possible to find a syntax for several list markers to be > >> written in one string? One possible solution is: > >> > >> @counter-style lower-norwegian { > >> type: alphabetic; > >> glyphs: 'abcdefghijklmnopqrstuvwxyzæøå'; > >> } > > > > With the above, the specification of where one entry ends and the next begins > > is important, particularly considering characters in decomposed form. > > > > Would introducing space separation 'a b c ...' be acceptable, or do we > > want to try specifying some other division of a string into entries? > > > > Spacing, while longer, would at least allow easy expansion to the odd > > multi-character entry like Greek "στ", and might give less surprising > > results for borderline cases like Αι or ij that are sometimes considered > > a single character. As to how often such cases arise in practice: The situation sometimes comes up in languages that have several scripts, or that have changed from one script to another. An example of this currently in the css3-lists spec is Oromo when written in Qubee (latin) script, which has counters aa, ee, ii, oo, uu, ch, dh, kh, ny, ph, sh. The persion-abjad counter style provides a different sort of example, where one of the counters is terminated by U+200d zwj. http://en.wikipedia.org/wiki/Digraph_%28orthography%29 mentions the existence of languages that have digraphs as part of their alphabets (giving the example of Czech ch, along with some examples that probably don't count for our purposes), though I don't know how many languages use such digraphs in list/chapter/table/... numbering. It's discomforting that any errors of this kind can easily go unnoticed by the stylesheet author if they occur after the first few items. On the other hand, one could counter that at least the error won't often get rendered if it occurs after the first few items. (I can't say that this counter-argument makes me feel much better, but it still has some weight.) I was expecting the spaced option to be the most readable of the three options, because each item value is cleanly separated both from other items and from quotation marks. However, for alphabetic and numeric counter styles, this shouldn't usually be a problem, and in fact it can be more legible without the space, as was the case for the lower-norwegian example below. Reading is probably much more important than writing for counter-style declarations. In typing difficulty, the spaced version is between the two. If the counter style is an alphabetic one for the script that one usually types in, and each each item is a single keystroke, then I'm surprised to find that adding spaces is more than twice as hard as the spaceless version. Typing the full syntax also has a surprise for the alphabet case: typing the three characters "' '" between item values is mentally much less comfortable than either of the shorthand options, perhaps because it's much more of a distraction to thinking what the next letter is. These differences will be much less noticeable for non-alphabetic counter styles. Regarding the criticism of space separation that it makes it almost like the full syntax: on one hand it does double the length (in monospaced font, when written without using hex escapes) of the written value compared to the spaceless option, but on the other hand it halves the length compared to the full syntax. For a sequences of around 26 items, either spaced or spaceless options are likely to fit comfortably in a single line (even with an indent of 8 and a long keyword), while the full syntax probably won't fit in a line. Let's try a couple of common uses of @counter-style in all three options to get a feel for how they differ visually: @counter-style lower-norwegian { type: alphabetic; glyphs: 'abcdefghijklmnopqrstuvwxyzæøå'; } @counter-style lower-norwegian { type: alphabetic; glyphs: 'a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å'; } @counter-style lower-norwegian { type: alphabetic; glyphs: 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z' 'æ' 'ø' 'å'; } @counter-style lower-norwegian { type: alphabetic; glyphs: 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z' 'æ' 'ø' 'å' ; } @counter-style daggers { type: symbolic; glyphs: '*†‡§‖¶'; } @counter-style daggers { type: symbolic; glyphs: '* † ‡ § ‖ ¶'; } @counter-style daggers { type: symbolic; glyphs: '*' '†' '‡' '§' '‖' '¶'; } @counter-style daggers { type: symbolic; glyphs: '\2020\2021\a7\2016\b6'; } @counter-style daggers { type: symbolic; glyphs: '\2020 \2021 \a7 \2016 \b6'; } @counter-style daggers { type: symbolic; glyphs: '\2020' '\2021' '\a7' '\2016' '\b6'; } Those examples are interesting in that the lower-norwegian example is actually easier to read in unspaced form than spaced: the spaces make it easier to lose one's place. Both full syntax versions are actually pretty good for reading, whereas I was expecting the quotes to get in the way more. Whereas for the dagger example, the spaced version is easiest to read, as I'd have expected. For hex escapes, I find the spaced shorthand easiest to read, though the full syntax is also quite good for hex escapes if we did go with the unspaced option. > > A vaguely related issue with the above syntax is distinguishing between > > glyphs:'abc...' and glyphs:'•'. Should one of the keywords be changed, > > say to glyphs-string? Or do we want to guess based on length of the > > string (or presence of spaces) ? > > Guessing is definitely out. In the issue, I presented it with a > disambiguating keyword. "Guess" was perhaps an unfair choice of word on my part. If the criterion is written in the specification, then in an absolute sense it isn't really a guess any more than any other part of the syntax. On the other hand, on a quantitative level we can ask how likely a given choice of syntax is to behave differently from what a writer or reader intends or expects. One could similarly comment that "grapheme cluster" may be programatically unambiguous (once one has distinguished between "legacy grapheme cluster" and "extended grapheme cluster", and linked to the ~9 pages in the relevant Unicode annexe that define these terms); but when considering a human writer or reader of a stylesheet, one might talk of language choices such as "grapheme cluster boundary" for item separator in terms of how good a "guess" it is as to matching author expectations, or how often it would "mis-guess". In both of the above questions, my own tendancy is to prefer the option that's longer but harder to misunderstand (which showed through in my use of the word "guess" in the previous message); but I suspect that that preferenence comes from my background in software development, where reliability concerns differ considerably from in stylesheet development. pjrm.
Received on Wednesday, 23 November 2011 08:49:35 UTC