- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 04 Jan 2006 13:18:06 +0100
- To: Jeremy Carroll <jjc@hpl.hp.com>
- Cc: "www-international@w3.org" <www-international@w3.org>
* Jeremy Carroll wrote: >For B, my code does an initial pass of the characters in each component, >looking for problematic characters e.g. "--" in the host, or "/./" in >the path. If it finds such problematic characters it may trigger more >expensive processing (e.g. IDNA syntax checking). What are the >characters I should be looking for in the component? i.e. please suggest >a set of characters is such that if none of these characters is in the >IRI then it is necessarily in NKFC? An example would be the set >[^\x20-\x7F] which would at least allow me to avoid NKFC checking for >URIs. Again I am expecting an answer in terms of some table from >unicode.org. e.g. if each character is neither a compatibility character >nor a composing character then the component is in NKFC. http://www.unicode.org/unicode/reports/tr15/ has a quickCheck function for that. I guess libraries like ICU already offer something like it. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Wednesday, 4 January 2006 15:04:24 UTC