W3C home > Mailing lists > Public > www-archive@w3.org > June 2002

Re: CWM bug + RFE: N3 to XML RDF fails on non-trivial Unicode

From: Dan Connolly <connolly@w3.org>
Date: 28 Jun 2002 11:22:02 -0500
To: Sampo Syreeni <decoy@iki.fi>
Cc: Tim Berners-Lee <timbl@w3.org>, www-archive+n3bugs@w3.org
Message-Id: <1025281323.22220.282.camel@dirk>

On Mon, 2002-06-24 at 09:20, Sampo Syreeni wrote:
> I'm not exactly sure what the proper procedure is for submitting CWM bugs,
> so I'm monkeying what Google serves me with...

You made a very good guess, then! Mail like this is the
preferred way to report cwm bugs.

> When doing "cwm decoy.n3 -rdf", CWM barfs on a non-ASCII Unicode character
> in the file.

Yes, I have reproduced the problem.

> I'm using CWM 1.82, and the N3 file can be found in
> http://www.iki.fi/~decoy/shared/meta/decoy.n3 . The problem doesn't recur
> when going to NTriples or N3;

it does for me. Hmm...

I reduced your decoy.n3 file to a small test case:

  $Id: utf8lit.n3,v 1.1 2002/06/28 16:15:39 connolly Exp $

It won't be in the regression suite (swap/test/retest.sh)
until we fix the bug.

I'm not sure I have time to hunt it down soon;
let me know if you find the bug or a patch.


> For one reason or another, removing the UTF-8 encoder from ToRDF (that is,
> changing "self._wr = XMLWriter(encWriter(outFp))" to "self._wr =
> XMLWriter(outFp)") bypasses the problem. I'm guessing the internal string
> representation is now ASCII, and the imported UTF-8 characters kill the
> default string encoder upon output.

Hmm... that doesn't sound right.

> Also, [...]

You were doing so great on bug-reporting-protocol until here. 1/2 ;-)

Each bug report and feature request belongs in its own message,

I'll reply to the feature request separately.

Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Friday, 28 June 2002 12:21:33 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:42:06 UTC