Re: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 7: ordinal not in range(128)

Thanks Dan for the fix suggestion on my script, fromIcal.py was working
correctly.

For the rest:

I wasn't reading the input given to fromIcal.py with a proper matching
encoding. I ended up using codecs.EncodedFile:

 up = urllib.urlopen(url)
 ical = codecs.EncodedFile(up,charset)
 sx = XMLWriter.T(codecs.getwriter('utf-8')(sys.stdout))
 fromIcal.interpret(sx, ical, url, ['X-'])

The charset is by default iso8859-1 if none specified in the
Content-Type header, else the charset=x in the header will be passed to
codecs.EncodedFile

Regards,

-Elias


Dan Connolly wrote:
> On Fri, 2006-04-21 at 11:11 -0400, Elias Torres wrote:
>> I have user from Argentina using my service (http://torrez.us/ics2rdf)
>> based on the toIcal.py scripts. However, he has non-ascii characters and
>> the script is failing. I just wanted to report the bug.
> 
> I'm not able to reproduce a failing of the script. The diagnostic
> I get suggests the data is bad:
> 
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
> invalid data
> 
> What version of fromIcal.py are you using?
> 
> I'm using:
> $Id: fromIcal.py,v 2.31 2006/04/11 20:29:00 connolly Exp $
> 
> I'm attaching a CVS log with dates so you can perhaps see which
> version you grabbed.
> 
> (I'd rather use a public version control history... but...
> sigh... long story...)
> 
> connolly@dirk:~/Desktop$ python2.4 -i
> ~/w3ccvs/WWW/2002/12/cal/fromIcal.py basic.ics
> Traceback (most recent call last):
>   File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 825,
> in ?
>     main()
>   File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 99, in
> main
>     interpret(sx, codecs.open(sys.argv[1], 'r', 'utf-8'), base,
> suppressed)
>   File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 135, in
> interpret
>     findComponents(lines, v, calendars)
>   File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 801, in
> findComponents
>     findComponents(lines, v, subs)
>   File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 784, in
> findComponents
>     n, p, v = parseLine(lines.next(), downcase=False)
>   File "/home/connolly/w3ccvs/WWW/2002/12/cal/icslex.py", line 165, in
> unbreak
>     s = lines.next().rstrip(CRLF)
>   File "/usr/lib/python2.4/codecs.py", line 494, in next
>     return self.reader.next()
>   File "/usr/lib/python2.4/codecs.py", line 431, in next
>     line = self.readline()
>   File "/usr/lib/python2.4/codecs.py", line 346, in readline
>     data = self.read(readsize, firstline=True)
>   File "/usr/lib/python2.4/codecs.py", line 293, in read
>     newchars, decodedbytes = self.decode(data, self.errors)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
> invalid data
> 
> 
> 
>> -Elias
>>
>>   File "index.py", line 24, in ?
>>     main()
>>   File "index.py", line 15, in main
>>     fromIcal.interpret(sx, ical, url, ['X-'])
>>   File "/_ics2rdf/fromIcal.py", line 142, in interpret
>>     doComponents(sx, calendars, iCalendarDefs, suppressed = suppressed)
>>   File "/_ics2rdf/fromIcal.py", line 350, in doComponents
>>     doComponents(sx, subs, subDecls, 'component', suppressed = suppressed)
>>   File "/_ics2rdf/fromIcal.py", line 345, in doComponents
>>     doProperties(sx, '', props, propDecls, suppressed = suppressed)
>>   File "/_ics2rdf/fromIcal.py", line 467, in doProperties
>>     doCalAddress(sx, elt, params, val)
>>   File "/_ics2rdf/fromIcal.py", line 662, in doCalAddress
>>     sx.characters(pv, 0, len(pv))
>>   File "/_ics2rdf/XMLWriter.py", line 79, in characters
>>     doChars(o, ch, start, length)
>>   File "/_ics2rdf/XMLWriter.py", line 92, in doChars
>>     o.write(ch[i:])
>>   File "/usr/lib/python2.4/codecs.py", line 178, in write
>>     data, consumed = self.encode(object, self.errors)
> 
> 
> ------------------------------------------------------------------------
> 
> 
> RCS file: /w3ccvs/WWW/2002/12/cal/fromIcal.py,v
> Working file: fromIcal.py
> head: 2.31
> branch:
> locks: strict
> access list:
> symbolic names:
> keyword substitution: kv
> total revisions: 32;	selected revisions: 32
> description:
> ----------------------------
> revision 2.31
> date: 2006/04/11 20:29:00;  author: connolly;  state: Exp;  lines: +103 -72
> finished factoring out icslex stuff: unbreak, parseLine
> findComponents is now more straightforwardly recursive
> ----------------------------
> revision 2.30
> date: 2006/04/09 06:02:39;  author: connolly;  state: Exp;  lines: +41 -95
> changeset:   7:5f8c551b2de38fb115789dfe7cbca0288a978f61
> tag:         tip
> user:        Dan Connolly <connolly@w3.org>
> date:        Sun Apr  9 01:01:32 2006 -0500
> files:       icslex.py
> description:
> add bymonthday to recurlex
> 
> 
> changeset:   6:32c567b22753c64f71c8de298adb87bad91ef567
> user:        Dan Connolly <connolly@w3.org>
> date:        Sun Apr  9 00:54:59 2006 -0500
> files:       icsxml.py
> description:
> use utf-8 to read files; kludge a couple more fields that the template assumes
> 
> 
> changeset:   5:12370cd5ad97cd5cea04e7ed4d5f6b55c0ac39ff
> user:        Dan Connolly <connolly@w3.org>
> date:        Sun Apr  9 00:54:13 2006 -0500
> files:       icslex.py
> description:
> make interval explict; use utf-8 to read files
> 
> 
> changeset:   4:0f319182ea4d6ee8a8b7f2ef042683323b75658d
> user:        Dan Connolly <connolly@w3.org>
> date:        Sun Apr  9 00:37:07 2006 -0500
> files:       icsxml.py
> description:
> works in one case, with a couple kludges
> 
> 
> changeset:   3:3e542292c8040d0dab310748ef07ffbce0a15b4a
> user:        Dan Connolly <connolly@w3.org>
> date:        Sun Apr  9 00:36:43 2006 -0500
> files:       icslex.py
> description:
> date, recur lex details
> 
> 
> changeset:   2:c2881393d0156b9263d760e98953ece6ba7591a6
> user:        Dan Connolly <connolly@w3.org>
> date:        Sun Apr  9 00:01:33 2006 -0500
> files:       icslex.py
> description:
> - parsing collections of properties as a dict/JSON object works
> - names are downcased by default
> - formatted docs per rst/epydoc
> 
> 
> changeset:   1:ecc1ad118fc61abb55e9634d15921483134f3328
> user:        Dan Connolly <connolly@w3.org>
> date:        Sat Apr  8 22:06:28 2006 -0500
> files:       icslex.py
> description:
> unbreak works
> 
> 
> changeset:   0:ec6eb270779b1ae046b9dd04be92034375392722
> user:        Dan Connolly <connolly@w3.org>
> date:        Sat Apr  8 21:50:45 2006 -0500
> files:       icslex.py
> description:
> parseLine tests pass
> ----------------------------
> revision 2.29
> date: 2005/11/09 23:10:48;  author: connolly;  state: Exp;  lines: +30 -9
> - changed the way duration values are modelled
>     The iCalendar DURATION value type is actually more than just a
>     XMLSchema.duration; it also has a RELATED parameter.
>     So for
>       TRIGGER;VALUE=DURATION;RELATED=START:-PT15M
>     we'll write
>       { ?E cal:trigger [ rdf:value "-PT15M"^^xsdt:duration;
>                          cal:related "START"] }
> 
> - fixed test data to have rdf:datatype on integer
>   values, to match the schema (which matches the RFC)
> 
> - fixed schema to show DATE-TIME properties (dtstart, ...)
>   as DatatypeProperties
>   (there are little/no tests for PERIOD; beware)
> 
> - scraped more details about property parameters (e.g. partstat, cn,
>   cutype, ...) and rrule parts (freq, interval, ...) from the RFC so
>   that they show up as links in the hypertext version and as RDF
>   properties in the schema.  likewise timezone components (standard,
>   daylight)
>  - side effect: added some whitespace in rfc2445.html
> 
> - demoted x- properties
>  - removed x- properties from .rdf versions of test data
>    this allows the round-trip tests to pass
>  - fromIcal.py doesn't output them unless you give the --x option
> 
> - added Makefile support for consistency checking with pellet
> 
> - demoted blank line diagnostic in fromIcal.py to a comment
> 
> - silenced some left-over debug diagnostics in slurpIcalSpec.py
> 
> - fixed test/test-created.rdf; added it to fromIcalTest.py list
> ----------------------------
> revision 2.28
> date: 2005/09/08 00:43:49;  author: connolly;  state: Exp;  lines: +10 -2
> avoid double hashes in ID
> ----------------------------
> revision 2.27
> date: 2005/04/22 14:16:56;  author: connolly;  state: Exp;  lines: +15 -6
> fix problems found when converting all the timezone files
> in evolution-data-server_1.0.4-1_i386.deb:
> - handle RDATE
> - handle multiple OlsonPfxs
> ----------------------------
> revision 2.26
> date: 2005/04/04 21:17:14;  author: connolly;  state: Exp;  lines: +5 -2
> fix initialization of iCalendar namespace
> ----------------------------
> revision 2.25
> date: 2005/03/30 15:35:21;  author: connolly;  state: Exp;  lines: +31 -2
> new namespace for timezones-as-datatypes design: icaltzd
> ----------------------------
> revision 2.24
> date: 2005/02/26 03:20:47;  author: connolly;  state: Exp;  lines: +63 -64
> fromIcal.py
> - revert the uid: trick; back to uids as fragids
> - timezones as datatypes in dates, dateTimes
> - Valarm supported in Vtodo as well as Vevent
>   (@@need test smaller than MozMulipleVcalendars.ics)
> - re-indented Vtodo decls while I was at it
> - case-fold END:xyz
> 
> fromIcalTest.py
> - base in http space
> - new tag-bug case
> 
> test/*.rdf
> - base in http space
> - timezones as datatypes
> 
> test/cal-regression.n3
> - moved tests that don't use X- first
> - got rid of initRDF
> 
> test/cal-retest.py
> - replace ical2rdf.pl with fromIcal.py
> - base in http space
> 
> test/cal-spec-examples.n3 new
> 
> test/graphCompare.n3 oops; extra debug crud
> ----------------------------
> revision 2.23
> date: 2005/02/10 21:39:00;  author: timbl;  state: Exp;  lines: +30 -7
> COUNT, LANGUAGE, X-UID, QUOTED-PRINTABLE under DanC's supervision
> ----------------------------
> revision 2.22
> date: 2005/02/02 21:54:45;  author: timbl;  state: Exp;  lines: +4 -2
> added --noalarm option - kindofa hack - take 2
> ----------------------------
> revision 2.21
> date: 2005/02/02 21:51:46;  author: timbl;  state: Exp;  lines: +21 -15
> added --noalarm option - kindofa hack
> ----------------------------
> revision 2.20
> date: 2005/02/02 21:39:20;  author: timbl;  state: Exp;  lines: +20 -16
> sync
> ----------------------------
> revision 2.19
> date: 2005/02/01 15:29:43;  author: timbl;  state: Exp;  lines: +5 -2
> hack to CREATED to add default type DATE-TIME.
> ----------------------------
> revision 2.18
> date: 2005/02/01 15:26:53;  author: timbl;  state: Exp;  lines: +5 -2
> pre hack to CREATED  default type.
> ----------------------------
> revision 2.17
> date: 2005/01/28 04:07:49;  author: timbl;  state: Exp;  lines: +35 -14
> Event URIs now absolute. Added --noprotocol and --help options
> ----------------------------
> revision 2.16
> date: 2004/09/30 14:16:01;  author: connolly;  state: Exp;  lines: +21 -9
> parseLine was buggy in the case of ; in values
> ----------------------------
> revision 2.15
> date: 2004/04/14 21:31:26;  author: connolly;  state: Exp;  lines: +21 -4
> added --base support so we can test with fragids
> ----------------------------
> revision 2.14
> date: 2004/04/14 21:12:13;  author: connolly;  state: Exp;  lines: +105 -29
> 
> - revamped doDateTime: use datatypes for dateTime values
>   - added __getattr__ to Namespace class
> - make well-known tzids into URIs in 2002/12/cal space
> - make UID into fragid
> - make local tzid into fragid
> ----------------------------
> revision 2.13
> date: 2004/04/08 14:09:11;  author: connolly;  state: Exp;  lines: +5 -2
> priority on VEVENT fixed
> ----------------------------
> revision 2.12
> date: 2004/04/07 18:27:17;  author: connolly;  state: Exp;  lines: +10 -3
> use real datatypes for list of floats, i.e. geo
> ----------------------------
> revision 2.11
> date: 2004/04/07 18:10:22;  author: connolly;  state: Exp;  lines: +41 -2
> convert list of float, as in GEO
> ----------------------------
> revision 2.10
> date: 2004/03/25 04:00:59;  author: connolly;  state: Exp;  lines: +9 -2
> allow recurrenceId wherever rrule can go
> handle WKST in recur values
> ----------------------------
> revision 2.9
> date: 2004/03/25 03:45:09;  author: connolly;  state: Exp;  lines: +3 -0
> handle UNTIL in rrule
> added EXDATE to compDecls wherever RRULE occurs
> ----------------------------
> revision 2.8
> date: 2004/03/25 03:43:48;  author: connolly;  state: Exp;  lines: +9 -1
> handle exdate ala rrule
> ----------------------------
> revision 2.7
> date: 2004/03/23 14:59:28;  author: connolly;  state: Exp;  lines: +10 -3
> allow missing PRODID
> ----------------------------
> revision 2.6
> date: 2004/03/10 21:59:31;  author: connolly;  state: Exp;  lines: +11 -2
> calendar schema is now generated from the RFC
> ----------------------------
> revision 2.5
> date: 2004/02/29 14:52:11;  author: connolly;  state: Exp;  lines: +49 -10
> todo support in fromIcal; value type label in schema
> ----------------------------
> revision 2.4
> date: 2004/02/12 07:17:05;  author: connolly;  state: Exp;  lines: +23 -6
> - handle URI value type
> - a few more default value type declarations
> ----------------------------
> revision 2.3
> date: 2004/02/12 06:30:48;  author: connolly;  state: Exp;  lines: +54 -11
> - doText unescapes text values per rfc2445#sec4.3.11
> - LAST-MODIFIED applies to VEVENT (fixed typo in RFC)
> - default type added for COMMENT
> - disabled UID->fragid conversion cuz it interferes with graph comparison
> - handle DIR parameter on CAL-ADDRESS value type
> ----------------------------
> revision 2.2
> date: 2004/02/11 22:04:10;  author: connolly;  state: Exp;  lines: +49 -21
> slightly nicer XML writer
> ----------------------------
> revision 2.1
> date: 2004/02/11 16:40:23;  author: connolly;  state: Exp;  lines: +7 -4
> finish renaming icalWebize.py to fromIcal.py
> ----------------------------
> revision 2.0
> date: 2004/02/11 16:37:48;  author: connolly;  state: Exp;
> copied from icalWebize.py
> =============================================================================

Received on Friday, 21 April 2006 16:46:19 UTC