- From: Christoph Päper <christoph.paeper@crissov.de>
- Date: Mon, 5 Dec 2011 14:52:19 +0100
- To: www-style list <www-style@w3.org>
Florian Rivoal: >> Christoph Päper: >> >>> scope: [ phrase || word || [ partial | [ initial || medial || final]# ] >>> || character || base || diacritic ]# There should be less ‘||’ and more ‘|’ instead, probably. > I think I like this, at least if we leave diacritics out of the discussion for a moment. That said, while the meaning of initial, medial and final is fairly obvious, I wouldn't mind an explanation of the other values. convert: "a b" to "c d"; applied to “ab a b a ba” gives ‘phrase’: “ab c d a ba” – spaces have no special meaning in the strings ‘word’: “ab c d c ba” – spaces are token boundaries, tokens are words ‘partial’: “cd c d c dc” – spaces are token boundaries ‘initial’: “cb c d c da” ‘medial’: “ab a b a ba” – no change, because no 3-letter string ‘final’: “ad c d c bc” ‘char…’: “cd c d c dc” – like ‘partial’ with space replacement ‘base’: “cd c d c dc” – like ‘character’, here convert: "ab" to "ud"; applied to “ab a b äb” gives ‘char…’: “od o d äd” – spaces are optional ‘base’: “od o d üd” So, ‘partial’ is a shortcut for ‘initial, medial, final’. Are shortcuts ‘non-initial’ / ‘tail’ = ‘medial, final’, ‘non-medial’ / ‘rim’ = ‘initial, final’ and ‘non-final’ / ‘head’ = ‘initial, medial’ useful? I thought a ‘diacritic’ scope would be useful for things like convert: "¨" to "\0308"; /* “ÄÖÜäöü”, no (visible) change */ convert: "¨" to "\0364"; /* “AͤOͤUͤaͤoͤuͤ” */ convert: "¨" to "e"; /* “AeOeUeaeoeue” */ convert: "¨" to ""; /* “AOUaou” */ with standalone U+00A8 instead of combining U+0308, which should work in any case. It perhaps isn’t. >> ‘convert’ is not an optimal choice. > > Any suggestion? Not really: ‘conversion’, ‘transform’ / ‘transformation’, ‘substitution’, ‘change’. Another problem I came across when I added the ISO transliteration example to the wiki page is that you run into text layout issues easily, e.g. when you try to do a hebrew or arabic romanization.
Received on Monday, 5 December 2011 13:52:48 UTC