- From: Elias Torres <elias@torrez.us>
- Date: Fri, 21 Apr 2006 12:45:59 -0400
- To: www-rdf-calendar@w3.org
Thanks Dan for the fix suggestion on my script, fromIcal.py was working
correctly.
For the rest:
I wasn't reading the input given to fromIcal.py with a proper matching
encoding. I ended up using codecs.EncodedFile:
up = urllib.urlopen(url)
ical = codecs.EncodedFile(up,charset)
sx = XMLWriter.T(codecs.getwriter('utf-8')(sys.stdout))
fromIcal.interpret(sx, ical, url, ['X-'])
The charset is by default iso8859-1 if none specified in the
Content-Type header, else the charset=x in the header will be passed to
codecs.EncodedFile
Regards,
-Elias
Dan Connolly wrote:
> On Fri, 2006-04-21 at 11:11 -0400, Elias Torres wrote:
>> I have user from Argentina using my service (http://torrez.us/ics2rdf)
>> based on the toIcal.py scripts. However, he has non-ascii characters and
>> the script is failing. I just wanted to report the bug.
>
> I'm not able to reproduce a failing of the script. The diagnostic
> I get suggests the data is bad:
>
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
> invalid data
>
> What version of fromIcal.py are you using?
>
> I'm using:
> $Id: fromIcal.py,v 2.31 2006/04/11 20:29:00 connolly Exp $
>
> I'm attaching a CVS log with dates so you can perhaps see which
> version you grabbed.
>
> (I'd rather use a public version control history... but...
> sigh... long story...)
>
> connolly@dirk:~/Desktop$ python2.4 -i
> ~/w3ccvs/WWW/2002/12/cal/fromIcal.py basic.ics
> Traceback (most recent call last):
> File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 825,
> in ?
> main()
> File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 99, in
> main
> interpret(sx, codecs.open(sys.argv[1], 'r', 'utf-8'), base,
> suppressed)
> File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 135, in
> interpret
> findComponents(lines, v, calendars)
> File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 801, in
> findComponents
> findComponents(lines, v, subs)
> File "/home/connolly/w3ccvs/WWW/2002/12/cal/fromIcal.py", line 784, in
> findComponents
> n, p, v = parseLine(lines.next(), downcase=False)
> File "/home/connolly/w3ccvs/WWW/2002/12/cal/icslex.py", line 165, in
> unbreak
> s = lines.next().rstrip(CRLF)
> File "/usr/lib/python2.4/codecs.py", line 494, in next
> return self.reader.next()
> File "/usr/lib/python2.4/codecs.py", line 431, in next
> line = self.readline()
> File "/usr/lib/python2.4/codecs.py", line 346, in readline
> data = self.read(readsize, firstline=True)
> File "/usr/lib/python2.4/codecs.py", line 293, in read
> newchars, decodedbytes = self.decode(data, self.errors)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
> invalid data
>
>
>
>> -Elias
>>
>> File "index.py", line 24, in ?
>> main()
>> File "index.py", line 15, in main
>> fromIcal.interpret(sx, ical, url, ['X-'])
>> File "/_ics2rdf/fromIcal.py", line 142, in interpret
>> doComponents(sx, calendars, iCalendarDefs, suppressed = suppressed)
>> File "/_ics2rdf/fromIcal.py", line 350, in doComponents
>> doComponents(sx, subs, subDecls, 'component', suppressed = suppressed)
>> File "/_ics2rdf/fromIcal.py", line 345, in doComponents
>> doProperties(sx, '', props, propDecls, suppressed = suppressed)
>> File "/_ics2rdf/fromIcal.py", line 467, in doProperties
>> doCalAddress(sx, elt, params, val)
>> File "/_ics2rdf/fromIcal.py", line 662, in doCalAddress
>> sx.characters(pv, 0, len(pv))
>> File "/_ics2rdf/XMLWriter.py", line 79, in characters
>> doChars(o, ch, start, length)
>> File "/_ics2rdf/XMLWriter.py", line 92, in doChars
>> o.write(ch[i:])
>> File "/usr/lib/python2.4/codecs.py", line 178, in write
>> data, consumed = self.encode(object, self.errors)
>
>
> ------------------------------------------------------------------------
>
>
> RCS file: /w3ccvs/WWW/2002/12/cal/fromIcal.py,v
> Working file: fromIcal.py
> head: 2.31
> branch:
> locks: strict
> access list:
> symbolic names:
> keyword substitution: kv
> total revisions: 32; selected revisions: 32
> description:
> ----------------------------
> revision 2.31
> date: 2006/04/11 20:29:00; author: connolly; state: Exp; lines: +103 -72
> finished factoring out icslex stuff: unbreak, parseLine
> findComponents is now more straightforwardly recursive
> ----------------------------
> revision 2.30
> date: 2006/04/09 06:02:39; author: connolly; state: Exp; lines: +41 -95
> changeset: 7:5f8c551b2de38fb115789dfe7cbca0288a978f61
> tag: tip
> user: Dan Connolly <connolly@w3.org>
> date: Sun Apr 9 01:01:32 2006 -0500
> files: icslex.py
> description:
> add bymonthday to recurlex
>
>
> changeset: 6:32c567b22753c64f71c8de298adb87bad91ef567
> user: Dan Connolly <connolly@w3.org>
> date: Sun Apr 9 00:54:59 2006 -0500
> files: icsxml.py
> description:
> use utf-8 to read files; kludge a couple more fields that the template assumes
>
>
> changeset: 5:12370cd5ad97cd5cea04e7ed4d5f6b55c0ac39ff
> user: Dan Connolly <connolly@w3.org>
> date: Sun Apr 9 00:54:13 2006 -0500
> files: icslex.py
> description:
> make interval explict; use utf-8 to read files
>
>
> changeset: 4:0f319182ea4d6ee8a8b7f2ef042683323b75658d
> user: Dan Connolly <connolly@w3.org>
> date: Sun Apr 9 00:37:07 2006 -0500
> files: icsxml.py
> description:
> works in one case, with a couple kludges
>
>
> changeset: 3:3e542292c8040d0dab310748ef07ffbce0a15b4a
> user: Dan Connolly <connolly@w3.org>
> date: Sun Apr 9 00:36:43 2006 -0500
> files: icslex.py
> description:
> date, recur lex details
>
>
> changeset: 2:c2881393d0156b9263d760e98953ece6ba7591a6
> user: Dan Connolly <connolly@w3.org>
> date: Sun Apr 9 00:01:33 2006 -0500
> files: icslex.py
> description:
> - parsing collections of properties as a dict/JSON object works
> - names are downcased by default
> - formatted docs per rst/epydoc
>
>
> changeset: 1:ecc1ad118fc61abb55e9634d15921483134f3328
> user: Dan Connolly <connolly@w3.org>
> date: Sat Apr 8 22:06:28 2006 -0500
> files: icslex.py
> description:
> unbreak works
>
>
> changeset: 0:ec6eb270779b1ae046b9dd04be92034375392722
> user: Dan Connolly <connolly@w3.org>
> date: Sat Apr 8 21:50:45 2006 -0500
> files: icslex.py
> description:
> parseLine tests pass
> ----------------------------
> revision 2.29
> date: 2005/11/09 23:10:48; author: connolly; state: Exp; lines: +30 -9
> - changed the way duration values are modelled
> The iCalendar DURATION value type is actually more than just a
> XMLSchema.duration; it also has a RELATED parameter.
> So for
> TRIGGER;VALUE=DURATION;RELATED=START:-PT15M
> we'll write
> { ?E cal:trigger [ rdf:value "-PT15M"^^xsdt:duration;
> cal:related "START"] }
>
> - fixed test data to have rdf:datatype on integer
> values, to match the schema (which matches the RFC)
>
> - fixed schema to show DATE-TIME properties (dtstart, ...)
> as DatatypeProperties
> (there are little/no tests for PERIOD; beware)
>
> - scraped more details about property parameters (e.g. partstat, cn,
> cutype, ...) and rrule parts (freq, interval, ...) from the RFC so
> that they show up as links in the hypertext version and as RDF
> properties in the schema. likewise timezone components (standard,
> daylight)
> - side effect: added some whitespace in rfc2445.html
>
> - demoted x- properties
> - removed x- properties from .rdf versions of test data
> this allows the round-trip tests to pass
> - fromIcal.py doesn't output them unless you give the --x option
>
> - added Makefile support for consistency checking with pellet
>
> - demoted blank line diagnostic in fromIcal.py to a comment
>
> - silenced some left-over debug diagnostics in slurpIcalSpec.py
>
> - fixed test/test-created.rdf; added it to fromIcalTest.py list
> ----------------------------
> revision 2.28
> date: 2005/09/08 00:43:49; author: connolly; state: Exp; lines: +10 -2
> avoid double hashes in ID
> ----------------------------
> revision 2.27
> date: 2005/04/22 14:16:56; author: connolly; state: Exp; lines: +15 -6
> fix problems found when converting all the timezone files
> in evolution-data-server_1.0.4-1_i386.deb:
> - handle RDATE
> - handle multiple OlsonPfxs
> ----------------------------
> revision 2.26
> date: 2005/04/04 21:17:14; author: connolly; state: Exp; lines: +5 -2
> fix initialization of iCalendar namespace
> ----------------------------
> revision 2.25
> date: 2005/03/30 15:35:21; author: connolly; state: Exp; lines: +31 -2
> new namespace for timezones-as-datatypes design: icaltzd
> ----------------------------
> revision 2.24
> date: 2005/02/26 03:20:47; author: connolly; state: Exp; lines: +63 -64
> fromIcal.py
> - revert the uid: trick; back to uids as fragids
> - timezones as datatypes in dates, dateTimes
> - Valarm supported in Vtodo as well as Vevent
> (@@need test smaller than MozMulipleVcalendars.ics)
> - re-indented Vtodo decls while I was at it
> - case-fold END:xyz
>
> fromIcalTest.py
> - base in http space
> - new tag-bug case
>
> test/*.rdf
> - base in http space
> - timezones as datatypes
>
> test/cal-regression.n3
> - moved tests that don't use X- first
> - got rid of initRDF
>
> test/cal-retest.py
> - replace ical2rdf.pl with fromIcal.py
> - base in http space
>
> test/cal-spec-examples.n3 new
>
> test/graphCompare.n3 oops; extra debug crud
> ----------------------------
> revision 2.23
> date: 2005/02/10 21:39:00; author: timbl; state: Exp; lines: +30 -7
> COUNT, LANGUAGE, X-UID, QUOTED-PRINTABLE under DanC's supervision
> ----------------------------
> revision 2.22
> date: 2005/02/02 21:54:45; author: timbl; state: Exp; lines: +4 -2
> added --noalarm option - kindofa hack - take 2
> ----------------------------
> revision 2.21
> date: 2005/02/02 21:51:46; author: timbl; state: Exp; lines: +21 -15
> added --noalarm option - kindofa hack
> ----------------------------
> revision 2.20
> date: 2005/02/02 21:39:20; author: timbl; state: Exp; lines: +20 -16
> sync
> ----------------------------
> revision 2.19
> date: 2005/02/01 15:29:43; author: timbl; state: Exp; lines: +5 -2
> hack to CREATED to add default type DATE-TIME.
> ----------------------------
> revision 2.18
> date: 2005/02/01 15:26:53; author: timbl; state: Exp; lines: +5 -2
> pre hack to CREATED default type.
> ----------------------------
> revision 2.17
> date: 2005/01/28 04:07:49; author: timbl; state: Exp; lines: +35 -14
> Event URIs now absolute. Added --noprotocol and --help options
> ----------------------------
> revision 2.16
> date: 2004/09/30 14:16:01; author: connolly; state: Exp; lines: +21 -9
> parseLine was buggy in the case of ; in values
> ----------------------------
> revision 2.15
> date: 2004/04/14 21:31:26; author: connolly; state: Exp; lines: +21 -4
> added --base support so we can test with fragids
> ----------------------------
> revision 2.14
> date: 2004/04/14 21:12:13; author: connolly; state: Exp; lines: +105 -29
>
> - revamped doDateTime: use datatypes for dateTime values
> - added __getattr__ to Namespace class
> - make well-known tzids into URIs in 2002/12/cal space
> - make UID into fragid
> - make local tzid into fragid
> ----------------------------
> revision 2.13
> date: 2004/04/08 14:09:11; author: connolly; state: Exp; lines: +5 -2
> priority on VEVENT fixed
> ----------------------------
> revision 2.12
> date: 2004/04/07 18:27:17; author: connolly; state: Exp; lines: +10 -3
> use real datatypes for list of floats, i.e. geo
> ----------------------------
> revision 2.11
> date: 2004/04/07 18:10:22; author: connolly; state: Exp; lines: +41 -2
> convert list of float, as in GEO
> ----------------------------
> revision 2.10
> date: 2004/03/25 04:00:59; author: connolly; state: Exp; lines: +9 -2
> allow recurrenceId wherever rrule can go
> handle WKST in recur values
> ----------------------------
> revision 2.9
> date: 2004/03/25 03:45:09; author: connolly; state: Exp; lines: +3 -0
> handle UNTIL in rrule
> added EXDATE to compDecls wherever RRULE occurs
> ----------------------------
> revision 2.8
> date: 2004/03/25 03:43:48; author: connolly; state: Exp; lines: +9 -1
> handle exdate ala rrule
> ----------------------------
> revision 2.7
> date: 2004/03/23 14:59:28; author: connolly; state: Exp; lines: +10 -3
> allow missing PRODID
> ----------------------------
> revision 2.6
> date: 2004/03/10 21:59:31; author: connolly; state: Exp; lines: +11 -2
> calendar schema is now generated from the RFC
> ----------------------------
> revision 2.5
> date: 2004/02/29 14:52:11; author: connolly; state: Exp; lines: +49 -10
> todo support in fromIcal; value type label in schema
> ----------------------------
> revision 2.4
> date: 2004/02/12 07:17:05; author: connolly; state: Exp; lines: +23 -6
> - handle URI value type
> - a few more default value type declarations
> ----------------------------
> revision 2.3
> date: 2004/02/12 06:30:48; author: connolly; state: Exp; lines: +54 -11
> - doText unescapes text values per rfc2445#sec4.3.11
> - LAST-MODIFIED applies to VEVENT (fixed typo in RFC)
> - default type added for COMMENT
> - disabled UID->fragid conversion cuz it interferes with graph comparison
> - handle DIR parameter on CAL-ADDRESS value type
> ----------------------------
> revision 2.2
> date: 2004/02/11 22:04:10; author: connolly; state: Exp; lines: +49 -21
> slightly nicer XML writer
> ----------------------------
> revision 2.1
> date: 2004/02/11 16:40:23; author: connolly; state: Exp; lines: +7 -4
> finish renaming icalWebize.py to fromIcal.py
> ----------------------------
> revision 2.0
> date: 2004/02/11 16:37:48; author: connolly; state: Exp;
> copied from icalWebize.py
> =============================================================================
Received on Friday, 21 April 2006 16:46:19 UTC