W3C home > Mailing lists > Public > www-archive@w3.org > June 2002

CWM bug + RFE: N3 to XML RDF fails on non-trivial Unicode

From: Sampo Syreeni <decoy@iki.fi>
Date: Mon, 24 Jun 2002 17:20:13 +0300 (EEST)
To: Dan Connolly <connolly@w3.org>
cc: Tim Berners-Lee <timbl@w3.org>, <www-archive+n3bugs@w3.org>
Message-ID: <Pine.SOL.4.30.0206241436200.26257-100000@kruuna.Helsinki.FI>

I'm not exactly sure what the proper procedure is for submitting CWM bugs,
so I'm monkeying what Google serves me with...

When doing "cwm decoy.n3 -rdf", CWM barfs on a non-ASCII Unicode character
in the file. I'm using CWM 1.82, and the N3 file can be found in
http://www.iki.fi/~decoy/shared/meta/decoy.n3 . The problem doesn't recur
when going to NTriples or N3; characters get escaped. The trace:

Traceback (most recent call last):
  File "cwm.py", line 598, in ?
    doCommand()
  File "cwm.py", line 581, in doCommand
    _store.dumpNested(workingContext, _outSink)
  File "llyn.py", line 1132, in dumpNested
    self.dumpNestedStatements(context, sink)
  File "llyn.py", line 1145, in dumpNestedStatements
    self._dumpSubject(currentSubject, context, sink, sorting, statements)
  File "llyn.py", line 1233, in _dumpSubject
    self.dumpStatement(sink, s.triple)
  File "llyn.py", line 1293, in dumpStatement
    self._outputStatement(sink, triple)
  File "llyn.py", line 877, in _outputStatement
    sink.makeStatement(self.extern(triple))
  File "notation3.py", line 847, in makeStatement
    self._wr.data(obj[1])
  File "notation3.py", line 1101, in data
    xmldata(o.write, str, self.dataEsc)
  File "notation3.py", line 1116, in xmldata
    write(str[i:])
  File "c:\python22\lib\codecs.py", line 137, in write
    data, consumed = self.encode(object, self.errors)
UnicodeError: ASCII decoding error: ordinal not in range(128)

For one reason or another, removing the UTF-8 encoder from ToRDF (that is,
changing "self._wr = XMLWriter(encWriter(outFp))" to "self._wr =
XMLWriter(outFp)") bypasses the problem. I'm guessing the internal string
representation is now ASCII, and the imported UTF-8 characters kill the
default string encoder upon output.

Also, it'd be nice to have a way to --think without adding the rules to
the store, that is, something like --rules but with repetition till the
store freezes. That sort of thing would really help when one needs to let
CWM --think on data which will eventually be served publicly, without the
rules used to come up with it.

Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Monday, 24 June 2002 10:20:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:17:17 GMT