I decided to write an RSS 3.0 [1] parser in Python for you. Here it is:- [dict(re.compile('(?s)([^\n:]+): (.*?)(?=\n[^ \t]|\Z)').findall(item)) for item in s.split('\n\n')] Where the variable s is the input. I kid you not; it works rather well:- >>> s = """title: RSS 3.0 News description: Latest updates on RSS 3.0. link: http://www.aaronsw.com/2002/rss30 creator: me@aaron...com Aaron Swartz errorsto: me@aaron...com Aaron Swartz title: Spec introduced created: 2002-09-06 guid: 00795648-C1E0-11D6-9AA6-003065F376B6 title: Zooko Likes It created: 2002-09-06 guid: 0894CB2F-C1E0-11D6-9649-003065F376B6 description: Zooko says he likes the spec.""" >>> [dict(re.compile('(?s)([^\n:]+): (.*?)(?=\n[^ \t]|\Z)').findall(i)) for i in s.split('\n\n')] [{'creator': 'me@aaron...com Aaron Swartz', 'link': 'http://www.aaronsw.com/2002/rss30', 'description': 'Latest updates on RSS 3.0.', 'errorsto': 'me@aaron...com Aaron Swartz', 'title': 'RSS 3.0 News'}, {'guid': '00795648-C1E0-11D6-9AA6-003065F376B6', 'created': '2002-09-06', 'title': 'Spec introduced'}, {'guid': '0894CB2F-C1E0-11D6-9649-003065F376B6', 'created': '2002-09-06', 'description': 'Zooko says he likes the spec.', 'title': 'Zooko Likes It'}] >>> It does assume, however, that fields must be unique; if they're repeated, it uses the last one. [1] http://www.aaronsw.com/2002/rss30 -- Kindest Regards, Sean B. Palmer @prefix : <http://purl.org/net/swn#> . :Sean :homepage <http://purl.org/net/sbp/> .Received on Friday, 6 September 2002 19:48:48 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 July 2008 08:08:44 GMT