- From: Anne van Kesteren <annevk@opera.com>
- Date: Tue, 03 Apr 2012 13:23:07 +0200
- To: www-archive@w3.org
- Message-ID: <op.wb6skt1v64w2qv@annevk-macbookpro.local>
Archiving progress. big5urls.txt is a subset of the URLs from http://dotnetdotcom.org/ that likely represent resources encoded in big5 or big5-hkscs. Given that the http://dotnetdotcom.org/ data is encoded in utf-8 they had to be fetched again to be able to study the original byte sequences. $ python big5urls.py fetching 19 finance.people.com.cn/BIG5/67815/68059/5780219.html fetching 155 yoyonet.biz/egoing/map/fasttrains.htm writing 185 forum.timway.com/f/forumdisplay.php?fid=34&filter=digest writing 186 forum.timway.com/f/viewthread.php?tid=212954&extra=&page=12 fetching 191 www.tw16.net/monographList.asp?m1No=12 writing 204 www.toysdaily.com/discuz/redirect.php?goto=findpost&ptid=70327 fetching 298 urbase.net/viewthread.php?action=printable&tid=5396 fetching 400 www.feverforum.com/forumdisplay.php?fid=17&filter=digest fetching 428 env.people.com.cn/BIG5/5041235.html fetching 508 urbase.net/viewthread.php?action=printable&tid=5429 -- Anne van Kesteren http://annevankesteren.nl/
Attachments
- application/octet-stream attachment: big5urls.py
- text/plain attachment: big5urls.txt
Received on Tuesday, 3 April 2012 11:23:45 UTC