big5 study; take 2; part 1

Archiving progress. big5urls.txt is a subset of the URLs from  
http://dotnetdotcom.org/ that likely represent resources encoded in big5  
or big5-hkscs. Given that the http://dotnetdotcom.org/ data is encoded in  
utf-8 they had to be fetched again to be able to study the original byte  
sequences.

$ python big5urls.py
fetching 19 finance.people.com.cn/BIG5/67815/68059/5780219.html
fetching 155 yoyonet.biz/egoing/map/fasttrains.htm
writing 185 forum.timway.com/f/forumdisplay.php?fid=34&filter=digest
writing 186 forum.timway.com/f/viewthread.php?tid=212954&extra=&page=12
fetching 191 www.tw16.net/monographList.asp?m1No=12
writing 204 www.toysdaily.com/discuz/redirect.php?goto=findpost&ptid=70327
fetching 298 urbase.net/viewthread.php?action=printable&tid=5396
fetching 400 www.feverforum.com/forumdisplay.php?fid=17&filter=digest
fetching 428 env.people.com.cn/BIG5/5041235.html
fetching 508 urbase.net/viewthread.php?action=printable&tid=5429


-- 
Anne van Kesteren
http://annevankesteren.nl/

Received on Tuesday, 3 April 2012 11:23:45 UTC