- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Fri, 19 Aug 2022 09:50:00 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: public-ixml@w3.org
> But from the wording of your mail, one might get the impression that you > are constructing your data structures by hand (I'm not sure, but to me > it sounded that way); I hope that in fact you are extracting it > automatically from the UCD. First observation, then (automatic) construction. For instance, if a range is interrupted by unassigned codepoints, then I just include them in the range and be done with it. > Thinking about it a bit just now, I think you may do better with a > simpler data structure just listing the ranges. A good analysis. What you didn't know was that ABC uses B Trees for its data structures, so what you describe I get mostly for free anyway (one of the reasons I continue to program mostly in ABC. It has other performance advantages as well, such as assignment being O(1)). > In practice, the code points of the Universal Character Set are probably > not equiprobable in your input; Quite. I'm assuming ASCII will occur far more frequently than anything else, so in fact I treat those first before going on to deal with characters higher up. Steven
Received on Friday, 19 August 2022 09:50:20 UTC