Re: Nice Data Cleansing Tool Demo

Hi Aldo,

On Mar/30/10 1:46 am, Aldo Bucchi wrote:
> Hi David,
>
> I love it and I NEED it ;)
> Awesome work, really.
>
> I heard it will be opensource so I will probably be able to extend it
> myself,
Yup, it'll be open source. Clean data sets are all clean the same way, 
but each dirty data set is dirty in its own way. Which is why Gridworks 
needs all the open source contributions in order to cover as many 
different kinds of data dirtiness as possible. :-)

> but here are some ideas for (missing?) features:
> * Importing custom Lookups/Dictionaries ( to go from text to IDs or
> the other way around ). Maybe this is possible using a different hook
> for the reconciliation mechanism.
> * Related: Plug in other reconciliation services ( not sure how this
> stands up to freebase biz alignment )
>    
Definitely. Right now Gridworks is hooked up to 2 services: the Freebase 
text search service (called "relevance") and the experimental proper 
reconciliation service. It makes sense to be able to plug in other 
services as well.

> * Command line engine. To add a GW project as a step in a traditional
> transformation job and execute steps sequentially.
>    
We've thought of that, too, but haven't implemented it. That shouldn't 
be too hard.

> * Expose Gazetteers ( dictionaries ) generated within the tool ( when
> equating facets )
>    
That makes sense. I'll think more about how to support that.

David

Received on Monday, 29 March 2010 22:18:34 UTC