- From: Elias Torres <elias@torrez.us>
- Date: Fri, 21 Apr 2006 12:45:59 -0400
- To: www-rdf-calendar@w3.org
Thanks Dan for the fix suggestion on my script, fromIcal.py was working correctly. For the rest: I wasn't reading the input given to fromIcal.py with a proper matching encoding. I ended up using codecs.EncodedFile: up = urllib.urlopen(url) ical = codecs.EncodedFile(up,charset) sx = XMLWriter.T(codecs.getwriter('utf-8')(sys.stdout)) fromIcal.interpret(sx, ical, url, ['X-']) The charset is by default iso8859-1 if none specified in the Content-Type header, else the charset=x in the header will be passed to codecs.EncodedFile Regards, -Elias Dan Connolly wrote: > On Fri, 2006-04-21 at 11:11 -0400, Elias Torres wrote: >> I have user from Argentina using my service (http://torrez.us/ics2rdf) >> based on the toIcal.py scripts. However, he has non-ascii characters and >> the script is failing. I just wanted to report the bug. > > I'm not able to reproduce a failing of the script. The diagnostic > I get suggests the data is bad: > > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: > invalid data > > What version of fromIcal.py are you using? > > I'm using: > $Id: fromIcal.py,v 2.31 2006/04/11 20:29:00 connolly Exp $ > > I'm attaching a CVS log with dates so you can perhaps see which > version you grabbed. > > (I'd rather use a public version control history... but... > sigh... long story...) > > connolly@dirk:~/Desktop$ python2.4 -i > ~/w3ccvs/WWW/2002/12/cal/fromIcal.py basic.ics > Traceback (most recent call last): > File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 825, > in ? > main() > File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 99, in > main > interpret(sx, codecs.open(sys.argv[1], 'r', 'utf-8'), base, > suppressed) > File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 135, in > interpret > findComponents(lines, v, calendars) > File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 801, in > findComponents > findComponents(lines, v, subs) > File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 784, in > findComponents > n, p, v = parseLine(lines.next(), downcase=False) > File "/home/connolly/w3ccvs/WWW/2002/12/cal/icslex.py", line 165, in > unbreak > s = lines.next().rstrip(CRLF) > File "/usr/lib/python2.4/codecs.py", line 494, in next > return self.reader.next() > File "/usr/lib/python2.4/codecs.py", line 431, in next > line = self.readline() > File "/usr/lib/python2.4/codecs.py", line 346, in readline > data = self.read(readsize, firstline=True) > File "/usr/lib/python2.4/codecs.py", line 293, in read > newchars, decodedbytes = self.decode(data, self.errors) > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: > invalid data > > > >> -Elias >> >> File "index.py", line 24, in ? >> main() >> File "index.py", line 15, in main >> fromIcal.interpret(sx, ical, url, ['X-']) >> File "/_ics2rdf/fromIcal.py", line 142, in interpret >> doComponents(sx, calendars, iCalendarDefs, suppressed = suppressed) >> File "/_ics2rdf/fromIcal.py", line 350, in doComponents >> doComponents(sx, subs, subDecls, 'component', suppressed = suppressed) >> File "/_ics2rdf/fromIcal.py", line 345, in doComponents >> doProperties(sx, '', props, propDecls, suppressed = suppressed) >> File "/_ics2rdf/fromIcal.py", line 467, in doProperties >> doCalAddress(sx, elt, params, val) >> File "/_ics2rdf/fromIcal.py", line 662, in doCalAddress >> sx.characters(pv, 0, len(pv)) >> File "/_ics2rdf/XMLWriter.py", line 79, in characters >> doChars(o, ch, start, length) >> File "/_ics2rdf/XMLWriter.py", line 92, in doChars >> o.write(ch[i:]) >> File "/usr/lib/python2.4/codecs.py", line 178, in write >> data, consumed = self.encode(object, self.errors) > > > ------------------------------------------------------------------------ > > > RCS file: /w3ccvs/WWW/2002/12/cal/fromIcal.py,v > Working file: fromIcal.py > head: 2.31 > branch: > locks: strict > access list: > symbolic names: > keyword substitution: kv > total revisions: 32; selected revisions: 32 > description: > ---------------------------- > revision 2.31 > date: 2006/04/11 20:29:00; author: connolly; state: Exp; lines: +103 -72 > finished factoring out icslex stuff: unbreak, parseLine > findComponents is now more straightforwardly recursive > ---------------------------- > revision 2.30 > date: 2006/04/09 06:02:39; author: connolly; state: Exp; lines: +41 -95 > changeset: 7:5f8c551b2de38fb115789dfe7cbca0288a978f61 > tag: tip > user: Dan Connolly <connolly@w3.org> > date: Sun Apr 9 01:01:32 2006 -0500 > files: icslex.py > description: > add bymonthday to recurlex > > > changeset: 6:32c567b22753c64f71c8de298adb87bad91ef567 > user: Dan Connolly <connolly@w3.org> > date: Sun Apr 9 00:54:59 2006 -0500 > files: icsxml.py > description: > use utf-8 to read files; kludge a couple more fields that the template assumes > > > changeset: 5:12370cd5ad97cd5cea04e7ed4d5f6b55c0ac39ff > user: Dan Connolly <connolly@w3.org> > date: Sun Apr 9 00:54:13 2006 -0500 > files: icslex.py > description: > make interval explict; use utf-8 to read files > > > changeset: 4:0f319182ea4d6ee8a8b7f2ef042683323b75658d > user: Dan Connolly <connolly@w3.org> > date: Sun Apr 9 00:37:07 2006 -0500 > files: icsxml.py > description: > works in one case, with a couple kludges > > > changeset: 3:3e542292c8040d0dab310748ef07ffbce0a15b4a > user: Dan Connolly <connolly@w3.org> > date: Sun Apr 9 00:36:43 2006 -0500 > files: icslex.py > description: > date, recur lex details > > > changeset: 2:c2881393d0156b9263d760e98953ece6ba7591a6 > user: Dan Connolly <connolly@w3.org> > date: Sun Apr 9 00:01:33 2006 -0500 > files: icslex.py > description: > - parsing collections of properties as a dict/JSON object works > - names are downcased by default > - formatted docs per rst/epydoc > > > changeset: 1:ecc1ad118fc61abb55e9634d15921483134f3328 > user: Dan Connolly <connolly@w3.org> > date: Sat Apr 8 22:06:28 2006 -0500 > files: icslex.py > description: > unbreak works > > > changeset: 0:ec6eb270779b1ae046b9dd04be92034375392722 > user: Dan Connolly <connolly@w3.org> > date: Sat Apr 8 21:50:45 2006 -0500 > files: icslex.py > description: > parseLine tests pass > ---------------------------- > revision 2.29 > date: 2005/11/09 23:10:48; author: connolly; state: Exp; lines: +30 -9 > - changed the way duration values are modelled > The iCalendar DURATION value type is actually more than just a > XMLSchema.duration; it also has a RELATED parameter. > So for > TRIGGER;VALUE=DURATION;RELATED=START:-PT15M > we'll write > { ?E cal:trigger [ rdf:value "-PT15M"^^xsdt:duration; > cal:related "START"] } > > - fixed test data to have rdf:datatype on integer > values, to match the schema (which matches the RFC) > > - fixed schema to show DATE-TIME properties (dtstart, ...) > as DatatypeProperties > (there are little/no tests for PERIOD; beware) > > - scraped more details about property parameters (e.g. partstat, cn, > cutype, ...) and rrule parts (freq, interval, ...) from the RFC so > that they show up as links in the hypertext version and as RDF > properties in the schema. likewise timezone components (standard, > daylight) > - side effect: added some whitespace in rfc2445.html > > - demoted x- properties > - removed x- properties from .rdf versions of test data > this allows the round-trip tests to pass > - fromIcal.py doesn't output them unless you give the --x option > > - added Makefile support for consistency checking with pellet > > - demoted blank line diagnostic in fromIcal.py to a comment > > - silenced some left-over debug diagnostics in slurpIcalSpec.py > > - fixed test/test-created.rdf; added it to fromIcalTest.py list > ---------------------------- > revision 2.28 > date: 2005/09/08 00:43:49; author: connolly; state: Exp; lines: +10 -2 > avoid double hashes in ID > ---------------------------- > revision 2.27 > date: 2005/04/22 14:16:56; author: connolly; state: Exp; lines: +15 -6 > fix problems found when converting all the timezone files > in evolution-data-server_1.0.4-1_i386.deb: > - handle RDATE > - handle multiple OlsonPfxs > ---------------------------- > revision 2.26 > date: 2005/04/04 21:17:14; author: connolly; state: Exp; lines: +5 -2 > fix initialization of iCalendar namespace > ---------------------------- > revision 2.25 > date: 2005/03/30 15:35:21; author: connolly; state: Exp; lines: +31 -2 > new namespace for timezones-as-datatypes design: icaltzd > ---------------------------- > revision 2.24 > date: 2005/02/26 03:20:47; author: connolly; state: Exp; lines: +63 -64 > fromIcal.py > - revert the uid: trick; back to uids as fragids > - timezones as datatypes in dates, dateTimes > - Valarm supported in Vtodo as well as Vevent > (@@need test smaller than MozMulipleVcalendars.ics) > - re-indented Vtodo decls while I was at it > - case-fold END:xyz > > fromIcalTest.py > - base in http space > - new tag-bug case > > test/*.rdf > - base in http space > - timezones as datatypes > > test/cal-regression.n3 > - moved tests that don't use X- first > - got rid of initRDF > > test/cal-retest.py > - replace ical2rdf.pl with fromIcal.py > - base in http space > > test/cal-spec-examples.n3 new > > test/graphCompare.n3 oops; extra debug crud > ---------------------------- > revision 2.23 > date: 2005/02/10 21:39:00; author: timbl; state: Exp; lines: +30 -7 > COUNT, LANGUAGE, X-UID, QUOTED-PRINTABLE under DanC's supervision > ---------------------------- > revision 2.22 > date: 2005/02/02 21:54:45; author: timbl; state: Exp; lines: +4 -2 > added --noalarm option - kindofa hack - take 2 > ---------------------------- > revision 2.21 > date: 2005/02/02 21:51:46; author: timbl; state: Exp; lines: +21 -15 > added --noalarm option - kindofa hack > ---------------------------- > revision 2.20 > date: 2005/02/02 21:39:20; author: timbl; state: Exp; lines: +20 -16 > sync > ---------------------------- > revision 2.19 > date: 2005/02/01 15:29:43; author: timbl; state: Exp; lines: +5 -2 > hack to CREATED to add default type DATE-TIME. > ---------------------------- > revision 2.18 > date: 2005/02/01 15:26:53; author: timbl; state: Exp; lines: +5 -2 > pre hack to CREATED default type. > ---------------------------- > revision 2.17 > date: 2005/01/28 04:07:49; author: timbl; state: Exp; lines: +35 -14 > Event URIs now absolute. Added --noprotocol and --help options > ---------------------------- > revision 2.16 > date: 2004/09/30 14:16:01; author: connolly; state: Exp; lines: +21 -9 > parseLine was buggy in the case of ; in values > ---------------------------- > revision 2.15 > date: 2004/04/14 21:31:26; author: connolly; state: Exp; lines: +21 -4 > added --base support so we can test with fragids > ---------------------------- > revision 2.14 > date: 2004/04/14 21:12:13; author: connolly; state: Exp; lines: +105 -29 > > - revamped doDateTime: use datatypes for dateTime values > - added __getattr__ to Namespace class > - make well-known tzids into URIs in 2002/12/cal space > - make UID into fragid > - make local tzid into fragid > ---------------------------- > revision 2.13 > date: 2004/04/08 14:09:11; author: connolly; state: Exp; lines: +5 -2 > priority on VEVENT fixed > ---------------------------- > revision 2.12 > date: 2004/04/07 18:27:17; author: connolly; state: Exp; lines: +10 -3 > use real datatypes for list of floats, i.e. geo > ---------------------------- > revision 2.11 > date: 2004/04/07 18:10:22; author: connolly; state: Exp; lines: +41 -2 > convert list of float, as in GEO > ---------------------------- > revision 2.10 > date: 2004/03/25 04:00:59; author: connolly; state: Exp; lines: +9 -2 > allow recurrenceId wherever rrule can go > handle WKST in recur values > ---------------------------- > revision 2.9 > date: 2004/03/25 03:45:09; author: connolly; state: Exp; lines: +3 -0 > handle UNTIL in rrule > added EXDATE to compDecls wherever RRULE occurs > ---------------------------- > revision 2.8 > date: 2004/03/25 03:43:48; author: connolly; state: Exp; lines: +9 -1 > handle exdate ala rrule > ---------------------------- > revision 2.7 > date: 2004/03/23 14:59:28; author: connolly; state: Exp; lines: +10 -3 > allow missing PRODID > ---------------------------- > revision 2.6 > date: 2004/03/10 21:59:31; author: connolly; state: Exp; lines: +11 -2 > calendar schema is now generated from the RFC > ---------------------------- > revision 2.5 > date: 2004/02/29 14:52:11; author: connolly; state: Exp; lines: +49 -10 > todo support in fromIcal; value type label in schema > ---------------------------- > revision 2.4 > date: 2004/02/12 07:17:05; author: connolly; state: Exp; lines: +23 -6 > - handle URI value type > - a few more default value type declarations > ---------------------------- > revision 2.3 > date: 2004/02/12 06:30:48; author: connolly; state: Exp; lines: +54 -11 > - doText unescapes text values per rfc2445#sec4.3.11 > - LAST-MODIFIED applies to VEVENT (fixed typo in RFC) > - default type added for COMMENT > - disabled UID->fragid conversion cuz it interferes with graph comparison > - handle DIR parameter on CAL-ADDRESS value type > ---------------------------- > revision 2.2 > date: 2004/02/11 22:04:10; author: connolly; state: Exp; lines: +49 -21 > slightly nicer XML writer > ---------------------------- > revision 2.1 > date: 2004/02/11 16:40:23; author: connolly; state: Exp; lines: +7 -4 > finish renaming icalWebize.py to fromIcal.py > ---------------------------- > revision 2.0 > date: 2004/02/11 16:37:48; author: connolly; state: Exp; > copied from icalWebize.py > =============================================================================
Received on Friday, 21 April 2006 16:46:19 UTC