Re: ixampl goes Unicode

 > But from the wording of your mail, one might get the impression that you
 > are constructing your data structures by hand (I'm not sure, but to me
 > it sounded that way); I hope that in fact you are extracting it
 > automatically from the UCD.
First observation, then (automatic) construction. For instance, if a range 
is interrupted by unassigned codepoints, then I just include them in the 
range and be done with it.

 > Thinking about it a bit just now, I think you may do better with a
 > simpler data structure just listing the ranges. 

A good analysis. What you didn't know was that ABC uses B Trees for its 
data structures, so what you describe I get mostly for free anyway (one of 
the reasons I continue to program mostly in ABC. It has other performance 
advantages as well, such as assignment being O(1)).

 > In practice, the code points of the Universal Character Set are probably
 > not equiprobable in your input; 


Quite. I'm assuming ASCII will occur far more frequently than anything 
else, so in fact I treat those first before going on to deal with 
characters higher up.

Steven

Received on Friday, 19 August 2022 09:50:20 UTC