Tools to help extract info from the dataset

Hi,

I mentioned in webmob I developed small Go tool to do a quick initial
review of the information from the webdevdata.org dataset.

Marcos suggested I shared the program, so I've cleaned it and here it is:

https://github.com/ernesto-jimenez/webdevdata-tools

Some highlights:
 - I've used Go's html parser to analyse actual tags and avoid false
positives from just using grep (tags indie HTML comments, attributes
mentioned in the website, etc.)
 - the code should be pretty self explanatory. It's less than 100 lines of
code
 - I cross-compiled the tools for Linux, Windows and Mac so that you don't
need to prepare a Go environment
 - the README has an example on how I use the tools with find+parallel

Best,
Ernesto

Received on Wednesday, 13 November 2013 03:58:57 UTC