- From: Sean B. Palmer <sean@mysterylights.com>
- Date: Fri, 5 Jul 2002 23:13:39 +0100
- To: "Dan Connolly" <connolly@w3.org>, "Sampo Syreeni" <decoy@iki.fi>
- Cc: "Tim Berners-Lee" <timbl@w3.org>, <www-archive+n3bugs@w3.org>
> it gave this output:- > > :con_Street "K\u00c3\u00a4mnerintie 4 A 22" . > > which is still wrong. I found that the problem was in the "stringToN3" function in notation3.py, which assumed that the string input was a python unicode string when it fact it was being passed a UTF-8 encoded string. The function needed to be updated anyway, so I've completely re-written it:- [[[ def stringToN3(s): Escapes = {'\a': '\\a', '\b': '\\b', '\f': '\\f', '\r': '\\r', '\v': '\\v', '"': '\\"'} # if this is not a unicode string, make it so if type(s) is type(''): s = unicode(s, 'utf-8') literal = '"""%s"""' if not ((len(s) > 20) and (s[-1] != '"') and (('"' in s) or ('\n' in s))): Escapes['\n'] = '\\n' Escapes['\t'] = '\\t' literal = '"%s"' s = s.replace('\\', '\\\\') for k in Escapes.keys(): s = s.replace(k, Escapes[k]) # to just UTF-8 encode: s = s.encode('utf-8') # but we'll convert them into \uXXXX codes s = re.sub(ur'([\u0080-\uffff])', lambda m: '\\u%04X' % ord(m.group(1)), s) return literal % s ]]] now it gives the following output for the utf8lit.n3 test case:- :con_Street "K\u00E4mnerintie 4 A 22" . and it should also be a bit quicker. -- Kindest Regards, Sean B. Palmer @prefix : <http://purl.org/net/swn#> . :Sean :homepage <http://purl.org/net/sbp/> .
Received on Friday, 5 July 2002 18:13:51 UTC