W3C home > Mailing lists > Public > public-lod@w3.org > March 2010

Re: Nice Data Cleansing Tool Demo

From: David Huynh <dfhuynh@alum.mit.edu>
Date: Tue, 30 Mar 2010 07:18:04 +0900
Message-ID: <4BB1271C.8060903@alum.mit.edu>
To: Aldo Bucchi <aldo.bucchi@gmail.com>
CC: Kingsley Idehen <kidehen@openlinksw.com>, "public-lod@w3.org" <public-lod@w3.org>
Hi Aldo,

On Mar/30/10 1:46 am, Aldo Bucchi wrote:
> Hi David,
>
> I love it and I NEED it ;)
> Awesome work, really.
>
> I heard it will be opensource so I will probably be able to extend it
> myself,
Yup, it'll be open source. Clean data sets are all clean the same way, 
but each dirty data set is dirty in its own way. Which is why Gridworks 
needs all the open source contributions in order to cover as many 
different kinds of data dirtiness as possible. :-)

> but here are some ideas for (missing?) features:
> * Importing custom Lookups/Dictionaries ( to go from text to IDs or
> the other way around ). Maybe this is possible using a different hook
> for the reconciliation mechanism.
> * Related: Plug in other reconciliation services ( not sure how this
> stands up to freebase biz alignment )
>    
Definitely. Right now Gridworks is hooked up to 2 services: the Freebase 
text search service (called "relevance") and the experimental proper 
reconciliation service. It makes sense to be able to plug in other 
services as well.

> * Command line engine. To add a GW project as a step in a traditional
> transformation job and execute steps sequentially.
>    
We've thought of that, too, but haven't implemented it. That shouldn't 
be too hard.

> * Expose Gazetteers ( dictionaries ) generated within the tool ( when
> equating facets )
>    
That makes sense. I'll think more about how to support that.

David
Received on Monday, 29 March 2010 22:18:34 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:25 UTC