Re: Tools to help extract info from the dataset

On Wednesday, November 13, 2013 at 3:58 AM, Ernesto Jiménez wrote:

> Hi,
>  
> I mentioned in webmob I developed small Go tool to do a quick initial review of the information from the webdevdata.org (http://webdevdata.org) dataset.
>  
> Marcos suggested I shared the program, so I've cleaned it and here it is:  
>  
> https://github.com/ernesto-jimenez/webdevdata-tools  
>  
> Some highlights:
> - I've used Go's html parser to analyse actual tags and avoid false positives from just using grep (tags indie HTML comments, attributes mentioned in the website, etc.)
> - the code should be pretty self explanatory. It's less than 100 lines of code
> - I cross-compiled the tools for Linux, Windows and Mac so that you don't need to prepare a Go environment
> - the README has an example on how I use the tools with find+parallel


This is awesome stuff. Ernesto also tracked down a jQuery like tool for Go, which I think has now been integrated into his tool. Should make searching much easier.  

Oh, and even more awesome - it’s all been migrated to WebDevData:
https://github.com/Webdevdata/webdevdata-tools

Thanks Ernesto! This is going to be super useful!  
   
   
--  
Marcos Caceres

Received on Wednesday, 13 November 2013 19:34:23 UTC