W3C home > Mailing lists > Public > www-archive@w3.org > September 2002

RSS 3.0 Parser in Python

From: Sean B. Palmer <sean@mysterylights.com>
Date: Sat, 7 Sep 2002 00:48:42 +0100
Message-ID: <05fc01c255ff$f0827c60$8ab80150@localhost>
To: "Aaron Swartz" <aswartz@swartzfam.com>
Cc: <www-archive@w3.org>

I decided to write an RSS 3.0 [1] parser in Python for you. Here it is:-

[dict(re.compile('(?s)([^\n:]+): (.*?)(?=\n[^ \t]|\Z)').findall(item))
   for item in s.split('\n\n')]

Where the variable s is the input. I kid you not; it works rather well:-

>>> s = """title: RSS 3.0 News
description: Latest updates on RSS 3.0.
link: http://www.aaronsw.com/2002/rss30
creator: me@aaron...com Aaron Swartz
errorsto: me@aaron...com Aaron Swartz

title: Spec introduced
created: 2002-09-06
guid: 00795648-C1E0-11D6-9AA6-003065F376B6

title: Zooko Likes It
created: 2002-09-06
guid: 0894CB2F-C1E0-11D6-9649-003065F376B6
description: Zooko says he likes the spec."""
>>> [dict(re.compile('(?s)([^\n:]+): (.*?)(?=\n[^ \t]|\Z)').findall(i))
 for i in s.split('\n\n')]
[{'creator': 'me@aaron...com Aaron Swartz', 'link':
'http://www.aaronsw.com/2002/rss30', 'description': 'Latest updates on RSS
3.0.', 'errorsto': 'me@aaron...com Aaron Swartz', 'title': 'RSS 3.0 News'},
{'guid': '00795648-C1E0-11D6-9AA6-003065F376B6', 'created': '2002-09-06',
'title': 'Spec introduced'}, {'guid':
'0894CB2F-C1E0-11D6-9649-003065F376B6', 'created': '2002-09-06',
'description': 'Zooko says he likes the spec.', 'title': 'Zooko Likes It'}]
>>>

It does assume, however, that fields must be unique; if they're repeated,
it uses the last one.

[1] http://www.aaronsw.com/2002/rss30

--
Kindest Regards,
Sean B. Palmer
@prefix : <http://purl.org/net/swn#> .
:Sean :homepage <http://purl.org/net/sbp/> .
Received on Friday, 6 September 2002 19:48:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:17:22 GMT