Re: question about IRI spec

* Jeremy Carroll wrote:
>For B, my code does an initial pass of the characters in each component, 
>looking for problematic characters e.g. "--" in the host, or "/./" in 
>the path. If it finds such problematic characters it may trigger more 
>expensive processing (e.g. IDNA syntax checking). What are the 
>characters I should be looking for in the component? i.e. please suggest 
>a set of characters is such that if none of these characters is in the 
>IRI then it is necessarily in NKFC? An example would be the set 
>[^\x20-\x7F] which would at least allow me to avoid NKFC checking for 
>URIs. Again I am expecting an answer in terms of some table from 
>unicode.org. e.g. if each character is neither a compatibility character 
>nor a composing character then the component is in NKFC.

http://www.unicode.org/unicode/reports/tr15/ has a quickCheck function
for that. I guess libraries like ICU already offer something like it.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Wednesday, 4 January 2006 15:04:24 UTC