> write(re.sub(r'([\x80-\xff])', > lambda m: '&#x%02X;' % ord(m.group(1)), str(s[i:]))) > [...] > <con_Street>Kämnerintie 4 A 22</con_Street> Heh, whoops; the following bit of code encodes the characters as proper Unicode rather than just encoding the bytes:- write(re.sub(ur'([\u0080-\uffff])', lambda m: '&#x%02X;' % ord(m.group(1)), unicode(s[i:], 'utf-8'))) output:- <con_Street>Kämnerintie 4 A 22</con_Street> sorry 'bout that. Whilst I'm writing again, I should note that the test case causes another unicode conversion error when converting N3 to N3 (i.e. just running it through the pretty printer). The traceback reveals:- File "/home/2000/10/swap/notation3.py", in strconst ustr = ustr + str[j:i] UnicodeError: ASCII decoding error: ordinal not in range(128) When I changed:- ustr = u"" # Empty unicode string to:- ustr = "" # Empty string it gave this output:- :con_Street "K\u00c3\u00a4mnerintie 4 A 22" . which is still wrong. You either want to give the UTF-8 encoded output with the bytes present, or the proper code for the character, which is \u00e4. As it is, it's just converting the bytes into unicode (um... a bit like I mistakenly did with the code in my last email). I managed to rig up a very silly and complex fix for this problem, but I'm sure there's a better way, so I'll just bring your attentions to it instead. -- Kindest Regards, Sean B. Palmer @prefix : <http://purl.org/net/swn#> . :Sean :homepage <http://purl.org/net/sbp/> .Received on Friday, 5 July 2002 17:36:20 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 July 2008 08:08:34 GMT