ACTION 2023-01-10-c: Steven to expand the tests in PR 169

> ACTION 2023-01-10-c: Steven to expand the tests in PR 169.


I ended up going through several iterations of this, but found it hard managing the input data, and keeping track of what the output should be.


In the end, I rewrote the test in order to manage both.


The input is any number of lines, with the class name first followed by a string of characters in that class (I'm not sure if the data below will make it intact through the mail system).


Every line in the input has at least one example of that class (which can be expanded as we wish), with the exception of Cs: I haven't yet succeeded in pasting a surrogate character into the data.


The nice property of this version of the test is that you don't have to check the output: if a character has been misclassified in an implementation, the input will not parse. (Strictly speaking, it doesn't even need to produce any output; maybe it even shouldn't since I suppose there are some characters in there that aren't legal XML)


I had to update my implementation: Now even Cn works! (There are no examples of Cn in the Unicode spec: a character has class Cn if it is not in any other class)


Steven

ixml
     1 classes: line+.
     2 -line: ( C; Cc; Cf; Cn; Co; Cs; L; LC; Ll; Lm; Lo; Lt; Lu; M; Mc; Me; Mn; N; Nd; Nl; No; P; Pc; Pd; Pe; Pf; Pi; Po; Ps; S; Sc; Sk; Sm; So; Z; Zl; Zp; Zs), newline.
     3 -newline: (-#a; -#d)+.
     4 
     5   C: -"C ", [C]*.
     6   L: -"L ", [L]*.
     7   M: -"M ", [M]*.
     8   N: -"N ", [N]*.
     9   P: -"P ", [P]*.
    10   S: -"S ", [S]*.
    11   Z: -"Z ", [Z]*.
    12   
    13   Cc: -"Cc ", [Cc]*.
    14   Cf: -"Cf ", [Cf]*.
    15   Cn: -"Cn ", [Cn]*.
    16   Co: -"Co ", [Co]*.
    17   Cs: -"Cs ", [Cs]*.
    18   LC: -"LC ", [LC]*.
    19   Ll: -"Ll ", [Ll]*.
    20   Lm: -"Lm ", [Lm]*.
    21   Lo: -"Lo ", [Lo]*.
    22   Lt: -"Lt ", [Lt]*.
    23   Lu: -"Lu ", [Lu]*.
    24   Mc: -"Mc ", [Mc]*.
    25   Me: -"Me ", [Me]*.
    26   Mn: -"Mn ", [Mn]*.
    27   Nd: -"Nd ", [Nd]*.
    28   Nl: -"Nl ", [Nl]*.
    29   No: -"No ", [No]*.
    30   Pc: -"Pc ", [Pc]*.
    31   Pd: -"Pd ", [Pd]*.
    32   Pe: -"Pe ", [Pe]*.
    33   Pf: -"Pf ", [Pf]*.
    34   Pi: -"Pi ", [Pi]*.
    35   Po: -"Po ", [Po]*.
    36   Ps: -"Ps ", [Ps]*.
    37   Sc: -"Sc ", [Sc]*.
    38   Sk: -"Sk ", [Sk]*.
    39   Sm: -"Sm ", [Sm]*.
    40   So: -"So ", [So]*.
    41   Zl: -"Zl ", [Zl]*.
    42   Zp: -"Zp ", [Zp]*.
    43   Zs: -"Zs ", [Zs]*.

input
     1 Lm ʰ
     2 Lo ªאتܐޓߊࠀࡀऄঅਅઅଅஅఅಅഅඅกກༀကᄀሀᐁᚁᚠᜀᜠᝀᝠកᠠᢰᤁᥐᦀᨀᨠᬅᮃᯀᰀᱚᳩℵⴰⶀぁァㄅ智取威虎山
     3 Ll aàdžµßαϐюաდᏸℊℹⰰⲁa𐐨𐓘𐖗𐳀𑣁ðþ
     4 Lu AÀDŽ
     5 Lt Dž
     6 LC aàdžAÀDŽDžΘϢЮԱႠᎠℂÅⰁⲀA𝐀𝓐𝕲
     7 L aàdžAÀDŽDžʰªאتܐޓߊࠀࡀऄঅਅઅଅஅఅಅഅඅกກༀကᄀሀᐁᚁᚠᜀᜠᝀᝠកᠠᢰᤁᥐᦀᨀᨠᬅᮃᯀᰀᱚᳩℵⴰⶀぁァㄅ智取威虎山
     8 Mc ऻ
     9 Me ҈
    10 Mn ̀
    11 M ऻ҈
    12 Nd 0٩۲߀०০੦૦
    13 Nl Ⅻⅻ
    14 No ²½
    15 N 0٩۲߀०০੦૦Ⅻⅻ²½
    16 Pc _‿⁀⁔︳︴﹍﹎﹏_
    17 Pd -
    18 Pe )]}
    19 Pf »’”›
    20 Pi «‘‛“
    21 Po !"#%&'*,./:;?@\¡§¶·
    22 Ps ([{
    23 P _‿⁀⁔︳︴﹍﹎﹏_-)]}»’”›«‘‛“!"#%&'*,./:;?@\¡§¶·([{
    24 Sc $¢£¤¥€
    25 Sk ^`¨¯´
    26 Sm +<=>|~¬→
    27 So ¦©®°
    28 S $¢£¤¥€^`¨¯´+<=>|~¬→¦©®°
    29 Zl 

    30 Zp 

    31 Zs   
    32 Z 

  
    33 Co 
    34 Cc ‚
    35 Cf ­
    36 Cs 
    37 Cn ͸
    38 C ‚­͸

result
<classes>
   <Lm>ʰ</Lm>
   <Lo>ªאتܐޓߊࠀࡀऄঅਅઅଅஅఅಅഅඅกກༀကᄀሀᐁᚁᚠᜀᜠᝀᝠកᠠᢰᤁᥐᦀᨀᨠᬅᮃᯀᰀᱚᳩℵⴰⶀぁァㄅ智取威虎山</Lo>
   <Ll>aàdžµßαϐюաდᏸℊℹⰰⲁa𐐨𐓘𐖗𐳀𑣁ðþ</Ll>
   <Lu>AÀDŽ</Lu>
   <Lt>Dž</Lt>
   <LC>aàdžAÀDŽDžΘϢЮԱႠᎠℂÅⰁⲀA𝐀𝓐𝕲</LC>
   <L>aàdžAÀDŽDžʰªאتܐޓߊࠀࡀऄঅਅઅଅஅఅಅഅඅกກༀကᄀሀᐁᚁᚠᜀᜠᝀᝠកᠠᢰᤁᥐᦀᨀᨠᬅᮃᯀᰀᱚᳩℵⴰⶀぁァㄅ智取威虎山</L>
   <Mc>ऻ</Mc>
   <Me>҈</Me>
   <Mn>̀</Mn>
   <M>ऻ҈</M>
   <Nd>0٩۲߀०০੦૦</Nd>
   <Nl>Ⅻⅻ</Nl>
   <No>²½</No>
   <N>0٩۲߀०০੦૦Ⅻⅻ²½</N>
   <Pc>_‿⁀⁔︳︴﹍﹎﹏_</Pc>
   <Pd>-</Pd>
   <Pe>)]}</Pe>
   <Pf>»’”›</Pf>
   <Pi>«‘‛“</Pi>
   <Po>!"#%&amp;'*,./:;?@\¡§¶·</Po>
   <Ps>([{</Ps>
   <P>_‿⁀⁔︳︴﹍﹎﹏_-)]}»’”›«‘‛“!"#%&amp;'*,./:;?@\¡§¶·([{</P>
   <Sc>$¢£¤¥€</Sc>
   <Sk>^`¨¯´</Sk>
   <Sm>+&lt;=>|~¬→</Sm>
   <So>¦©®°</So>
   <S>$¢£¤¥€^`¨¯´+&lt;=>|~¬→¦©®°</S>
   <Zl>
</Zl>
   <Zp>
</Zp>
   <Zs>  </Zs>
   <Z>

  </Z>
   <Co></Co>
   <Cc>‚</Cc>
   <Cf>­</Cf>
   <Cs/>
   <Cn>͸</Cn>
   <C>‚­͸</C>
</classes>

end

Received on Tuesday, 7 February 2023 11:34:14 UTC