CWM bug + RFE: N3 to XML RDF fails on non-trivial Unicode

I'm not exactly sure what the proper procedure is for submitting CWM bugs,
so I'm monkeying what Google serves me with...

When doing "cwm decoy.n3 -rdf", CWM barfs on a non-ASCII Unicode character
in the file. I'm using CWM 1.82, and the N3 file can be found in . The problem doesn't recur
when going to NTriples or N3; characters get escaped. The trace:

Traceback (most recent call last):
  File "", line 598, in ?
  File "", line 581, in doCommand
    _store.dumpNested(workingContext, _outSink)
  File "", line 1132, in dumpNested
    self.dumpNestedStatements(context, sink)
  File "", line 1145, in dumpNestedStatements
    self._dumpSubject(currentSubject, context, sink, sorting, statements)
  File "", line 1233, in _dumpSubject
    self.dumpStatement(sink, s.triple)
  File "", line 1293, in dumpStatement
    self._outputStatement(sink, triple)
  File "", line 877, in _outputStatement
  File "", line 847, in makeStatement[1])
  File "", line 1101, in data
    xmldata(o.write, str, self.dataEsc)
  File "", line 1116, in xmldata
  File "c:\python22\lib\", line 137, in write
    data, consumed = self.encode(object, self.errors)
UnicodeError: ASCII decoding error: ordinal not in range(128)

For one reason or another, removing the UTF-8 encoder from ToRDF (that is,
changing "self._wr = XMLWriter(encWriter(outFp))" to "self._wr =
XMLWriter(outFp)") bypasses the problem. I'm guessing the internal string
representation is now ASCII, and the imported UTF-8 characters kill the
default string encoder upon output.

Also, it'd be nice to have a way to --think without adding the rules to
the store, that is, something like --rules but with repetition till the
store freezes. That sort of thing would really help when one needs to let
CWM --think on data which will eventually be served publicly, without the
rules used to come up with it.

Sampo Syreeni, aka decoy -, tel:+358-50-5756111
student/math+cs/helsinki university,
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Received on Monday, 24 June 2002 10:20:18 UTC