- From: Sean B. Palmer <sean@mysterylights.com>
- Date: Sat, 7 Sep 2002 00:48:42 +0100
- To: "Aaron Swartz" <aswartz@swartzfam.com>
- Cc: <www-archive@w3.org>
I decided to write an RSS 3.0 [1] parser in Python for you. Here it is:- [dict(re.compile('(?s)([^\n:]+): (.*?)(?=\n[^ \t]|\Z)').findall(item)) for item in s.split('\n\n')] Where the variable s is the input. I kid you not; it works rather well:- >>> s = """title: RSS 3.0 News description: Latest updates on RSS 3.0. link: http://www.aaronsw.com/2002/rss30 creator: me@aaron...com Aaron Swartz errorsto: me@aaron...com Aaron Swartz title: Spec introduced created: 2002-09-06 guid: 00795648-C1E0-11D6-9AA6-003065F376B6 title: Zooko Likes It created: 2002-09-06 guid: 0894CB2F-C1E0-11D6-9649-003065F376B6 description: Zooko says he likes the spec.""" >>> [dict(re.compile('(?s)([^\n:]+): (.*?)(?=\n[^ \t]|\Z)').findall(i)) for i in s.split('\n\n')] [{'creator': 'me@aaron...com Aaron Swartz', 'link': 'http://www.aaronsw.com/2002/rss30', 'description': 'Latest updates on RSS 3.0.', 'errorsto': 'me@aaron...com Aaron Swartz', 'title': 'RSS 3.0 News'}, {'guid': '00795648-C1E0-11D6-9AA6-003065F376B6', 'created': '2002-09-06', 'title': 'Spec introduced'}, {'guid': '0894CB2F-C1E0-11D6-9649-003065F376B6', 'created': '2002-09-06', 'description': 'Zooko says he likes the spec.', 'title': 'Zooko Likes It'}] >>> It does assume, however, that fields must be unique; if they're repeated, it uses the last one. [1] http://www.aaronsw.com/2002/rss30 -- Kindest Regards, Sean B. Palmer @prefix : <http://purl.org/net/swn#> . :Sean :homepage <http://purl.org/net/sbp/> .
Received on Friday, 6 September 2002 19:48:48 UTC