W3C home > Mailing lists > Public > www-rdf-calendar@w3.org > April 2006

Re: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 7: ordinal not in range(128)

From: Dan Connolly <connolly@w3.org>
Date: Fri, 21 Apr 2006 10:47:28 -0500
To: Elias Torres <elias@torrez.us>
Cc: www-rdf-calendar@w3.org
Message-Id: <1145634448.27608.647.camel@dirk.w3.org>
On Fri, 2006-04-21 at 11:11 -0400, Elias Torres wrote:
> I have user from Argentina using my service (http://torrez.us/ics2rdf)
> based on the toIcal.py scripts. However, he has non-ascii characters and
> the script is failing. I just wanted to report the bug.

I'm not able to reproduce a failing of the script. The diagnostic
I get suggests the data is bad:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
invalid data

What version of fromIcal.py are you using?

I'm using:
$Id: fromIcal.py,v 2.31 2006/04/11 20:29:00 connolly Exp $

I'm attaching a CVS log with dates so you can perhaps see which
version you grabbed.

(I'd rather use a public version control history... but...
sigh... long story...)

connolly@dirk:~/Desktop$ python2.4 -i
~/w3ccvs/WWW/2002/12/cal/fromIcal.py basic.ics
Traceback (most recent call last):
  File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 825,
in ?
    main()
  File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 99, in
main
    interpret(sx, codecs.open(sys.argv[1], 'r', 'utf-8'), base,
suppressed)
  File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 135, in
interpret
    findComponents(lines, v, calendars)
  File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 801, in
findComponents
    findComponents(lines, v, subs)
  File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 784, in
findComponents
    n, p, v = parseLine(lines.next(), downcase=False)
  File "/home/connolly/w3ccvs/WWW/2002/12/cal/icslex.py", line 165, in
unbreak
    s = lines.next().rstrip(CRLF)
  File "/usr/lib/python2.4/codecs.py", line 494, in next
    return self.reader.next()
  File "/usr/lib/python2.4/codecs.py", line 431, in next
    line = self.readline()
  File "/usr/lib/python2.4/codecs.py", line 346, in readline
    data = self.read(readsize, firstline=True)
  File "/usr/lib/python2.4/codecs.py", line 293, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
invalid data



> 
> -Elias
> 
>   File "index.py", line 24, in ?
>     main()
>   File "index.py", line 15, in main
>     fromIcal.interpret(sx, ical, url, ['X-'])
>   File "/_ics2rdf/fromIcal.py", line 142, in interpret
>     doComponents(sx, calendars, iCalendarDefs, suppressed = suppressed)
>   File "/_ics2rdf/fromIcal.py", line 350, in doComponents
>     doComponents(sx, subs, subDecls, 'component', suppressed = suppressed)
>   File "/_ics2rdf/fromIcal.py", line 345, in doComponents
>     doProperties(sx, '', props, propDecls, suppressed = suppressed)
>   File "/_ics2rdf/fromIcal.py", line 467, in doProperties
>     doCalAddress(sx, elt, params, val)
>   File "/_ics2rdf/fromIcal.py", line 662, in doCalAddress
>     sx.characters(pv, 0, len(pv))
>   File "/_ics2rdf/XMLWriter.py", line 79, in characters
>     doChars(o, ch, start, length)
>   File "/_ics2rdf/XMLWriter.py", line 92, in doChars
>     o.write(ch[i:])
>   File "/usr/lib/python2.4/codecs.py", line 178, in write
>     data, consumed = self.encode(object, self.errors)

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E


RCS file: /w3ccvs/WWW/2002/12/cal/fromIcal.py,v
Working file: fromIcal.py
head: 2.31
branch:
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 32;	selected revisions: 32
description:
----------------------------
revision 2.31
date: 2006/04/11 20:29:00;  author: connolly;  state: Exp;  lines: +103 -72
finished factoring out icslex stuff: unbreak, parseLine
findComponents is now more straightforwardly recursive
----------------------------
revision 2.30
date: 2006/04/09 06:02:39;  author: connolly;  state: Exp;  lines: +41 -95
changeset:   7:5f8c551b2de38fb115789dfe7cbca0288a978f61
tag:         tip
user:        Dan Connolly <connolly@w3.org>
date:        Sun Apr  9 01:01:32 2006 -0500
files:       icslex.py
description:
add bymonthday to recurlex


changeset:   6:32c567b22753c64f71c8de298adb87bad91ef567
user:        Dan Connolly <connolly@w3.org>
date:        Sun Apr  9 00:54:59 2006 -0500
files:       icsxml.py
description:
use utf-8 to read files; kludge a couple more fields that the template assumes


changeset:   5:12370cd5ad97cd5cea04e7ed4d5f6b55c0ac39ff
user:        Dan Connolly <connolly@w3.org>
date:        Sun Apr  9 00:54:13 2006 -0500
files:       icslex.py
description:
make interval explict; use utf-8 to read files


changeset:   4:0f319182ea4d6ee8a8b7f2ef042683323b75658d
user:        Dan Connolly <connolly@w3.org>
date:        Sun Apr  9 00:37:07 2006 -0500
files:       icsxml.py
description:
works in one case, with a couple kludges


changeset:   3:3e542292c8040d0dab310748ef07ffbce0a15b4a
user:        Dan Connolly <connolly@w3.org>
date:        Sun Apr  9 00:36:43 2006 -0500
files:       icslex.py
description:
date, recur lex details


changeset:   2:c2881393d0156b9263d760e98953ece6ba7591a6
user:        Dan Connolly <connolly@w3.org>
date:        Sun Apr  9 00:01:33 2006 -0500
files:       icslex.py
description:
- parsing collections of properties as a dict/JSON object works
- names are downcased by default
- formatted docs per rst/epydoc


changeset:   1:ecc1ad118fc61abb55e9634d15921483134f3328
user:        Dan Connolly <connolly@w3.org>
date:        Sat Apr  8 22:06:28 2006 -0500
files:       icslex.py
description:
unbreak works


changeset:   0:ec6eb270779b1ae046b9dd04be92034375392722
user:        Dan Connolly <connolly@w3.org>
date:        Sat Apr  8 21:50:45 2006 -0500
files:       icslex.py
description:
parseLine tests pass
----------------------------
revision 2.29
date: 2005/11/09 23:10:48;  author: connolly;  state: Exp;  lines: +30 -9
- changed the way duration values are modelled
    The iCalendar DURATION value type is actually more than just a
    XMLSchema.duration; it also has a RELATED parameter.
    So for
      TRIGGER;VALUE=DURATION;RELATED=START:-PT15M
    we'll write
      { ?E cal:trigger [ rdf:value "-PT15M"^^xsdt:duration;
                         cal:related "START"] }

- fixed test data to have rdf:datatype on integer
  values, to match the schema (which matches the RFC)

- fixed schema to show DATE-TIME properties (dtstart, ...)
  as DatatypeProperties
  (there are little/no tests for PERIOD; beware)

- scraped more details about property parameters (e.g. partstat, cn,
  cutype, ...) and rrule parts (freq, interval, ...) from the RFC so
  that they show up as links in the hypertext version and as RDF
  properties in the schema.  likewise timezone components (standard,
  daylight)
 - side effect: added some whitespace in rfc2445.html

- demoted x- properties
 - removed x- properties from .rdf versions of test data
   this allows the round-trip tests to pass
 - fromIcal.py doesn't output them unless you give the --x option

- added Makefile support for consistency checking with pellet

- demoted blank line diagnostic in fromIcal.py to a comment

- silenced some left-over debug diagnostics in slurpIcalSpec.py

- fixed test/test-created.rdf; added it to fromIcalTest.py list
----------------------------
revision 2.28
date: 2005/09/08 00:43:49;  author: connolly;  state: Exp;  lines: +10 -2
avoid double hashes in ID
----------------------------
revision 2.27
date: 2005/04/22 14:16:56;  author: connolly;  state: Exp;  lines: +15 -6
fix problems found when converting all the timezone files
in evolution-data-server_1.0.4-1_i386.deb:
- handle RDATE
- handle multiple OlsonPfxs
----------------------------
revision 2.26
date: 2005/04/04 21:17:14;  author: connolly;  state: Exp;  lines: +5 -2
fix initialization of iCalendar namespace
----------------------------
revision 2.25
date: 2005/03/30 15:35:21;  author: connolly;  state: Exp;  lines: +31 -2
new namespace for timezones-as-datatypes design: icaltzd
----------------------------
revision 2.24
date: 2005/02/26 03:20:47;  author: connolly;  state: Exp;  lines: +63 -64
fromIcal.py
- revert the uid: trick; back to uids as fragids
- timezones as datatypes in dates, dateTimes
- Valarm supported in Vtodo as well as Vevent
  (@@need test smaller than MozMulipleVcalendars.ics)
- re-indented Vtodo decls while I was at it
- case-fold END:xyz

fromIcalTest.py
- base in http space
- new tag-bug case

test/*.rdf
- base in http space
- timezones as datatypes

test/cal-regression.n3
- moved tests that don't use X- first
- got rid of initRDF

test/cal-retest.py
- replace ical2rdf.pl with fromIcal.py
- base in http space

test/cal-spec-examples.n3 new

test/graphCompare.n3 oops; extra debug crud
----------------------------
revision 2.23
date: 2005/02/10 21:39:00;  author: timbl;  state: Exp;  lines: +30 -7
COUNT, LANGUAGE, X-UID, QUOTED-PRINTABLE under DanC's supervision
----------------------------
revision 2.22
date: 2005/02/02 21:54:45;  author: timbl;  state: Exp;  lines: +4 -2
added --noalarm option - kindofa hack - take 2
----------------------------
revision 2.21
date: 2005/02/02 21:51:46;  author: timbl;  state: Exp;  lines: +21 -15
added --noalarm option - kindofa hack
----------------------------
revision 2.20
date: 2005/02/02 21:39:20;  author: timbl;  state: Exp;  lines: +20 -16
sync
----------------------------
revision 2.19
date: 2005/02/01 15:29:43;  author: timbl;  state: Exp;  lines: +5 -2
hack to CREATED to add default type DATE-TIME.
----------------------------
revision 2.18
date: 2005/02/01 15:26:53;  author: timbl;  state: Exp;  lines: +5 -2
pre hack to CREATED  default type.
----------------------------
revision 2.17
date: 2005/01/28 04:07:49;  author: timbl;  state: Exp;  lines: +35 -14
Event URIs now absolute. Added --noprotocol and --help options
----------------------------
revision 2.16
date: 2004/09/30 14:16:01;  author: connolly;  state: Exp;  lines: +21 -9
parseLine was buggy in the case of ; in values
----------------------------
revision 2.15
date: 2004/04/14 21:31:26;  author: connolly;  state: Exp;  lines: +21 -4
added --base support so we can test with fragids
----------------------------
revision 2.14
date: 2004/04/14 21:12:13;  author: connolly;  state: Exp;  lines: +105 -29

- revamped doDateTime: use datatypes for dateTime values
  - added __getattr__ to Namespace class
- make well-known tzids into URIs in 2002/12/cal space
- make UID into fragid
- make local tzid into fragid
----------------------------
revision 2.13
date: 2004/04/08 14:09:11;  author: connolly;  state: Exp;  lines: +5 -2
priority on VEVENT fixed
----------------------------
revision 2.12
date: 2004/04/07 18:27:17;  author: connolly;  state: Exp;  lines: +10 -3
use real datatypes for list of floats, i.e. geo
----------------------------
revision 2.11
date: 2004/04/07 18:10:22;  author: connolly;  state: Exp;  lines: +41 -2
convert list of float, as in GEO
----------------------------
revision 2.10
date: 2004/03/25 04:00:59;  author: connolly;  state: Exp;  lines: +9 -2
allow recurrenceId wherever rrule can go
handle WKST in recur values
----------------------------
revision 2.9
date: 2004/03/25 03:45:09;  author: connolly;  state: Exp;  lines: +3 -0
handle UNTIL in rrule
added EXDATE to compDecls wherever RRULE occurs
----------------------------
revision 2.8
date: 2004/03/25 03:43:48;  author: connolly;  state: Exp;  lines: +9 -1
handle exdate ala rrule
----------------------------
revision 2.7
date: 2004/03/23 14:59:28;  author: connolly;  state: Exp;  lines: +10 -3
allow missing PRODID
----------------------------
revision 2.6
date: 2004/03/10 21:59:31;  author: connolly;  state: Exp;  lines: +11 -2
calendar schema is now generated from the RFC
----------------------------
revision 2.5
date: 2004/02/29 14:52:11;  author: connolly;  state: Exp;  lines: +49 -10
todo support in fromIcal; value type label in schema
----------------------------
revision 2.4
date: 2004/02/12 07:17:05;  author: connolly;  state: Exp;  lines: +23 -6
- handle URI value type
- a few more default value type declarations
----------------------------
revision 2.3
date: 2004/02/12 06:30:48;  author: connolly;  state: Exp;  lines: +54 -11
- doText unescapes text values per rfc2445#sec4.3.11
- LAST-MODIFIED applies to VEVENT (fixed typo in RFC)
- default type added for COMMENT
- disabled UID->fragid conversion cuz it interferes with graph comparison
- handle DIR parameter on CAL-ADDRESS value type
----------------------------
revision 2.2
date: 2004/02/11 22:04:10;  author: connolly;  state: Exp;  lines: +49 -21
slightly nicer XML writer
----------------------------
revision 2.1
date: 2004/02/11 16:40:23;  author: connolly;  state: Exp;  lines: +7 -4
finish renaming icalWebize.py to fromIcal.py
----------------------------
revision 2.0
date: 2004/02/11 16:37:48;  author: connolly;  state: Exp;
copied from icalWebize.py
=============================================================================
Received on Friday, 21 April 2006 15:47:50 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:14:13 UTC