W3C home > Mailing lists > Public > public-lod@w3.org > March 2010

Re: Nice Data Cleansing Tool Demo

From: Aldo Bucchi <aldo.bucchi@gmail.com>
Date: Mon, 29 Mar 2010 12:46:16 -0400
Message-ID: <7a4ebe1d1003290946s5e57b362r3fb5a2086effe550@mail.gmail.com>
To: David Huynh <dfhuynh@alum.mit.edu>
Cc: Kingsley Idehen <kidehen@openlinksw.com>, "public-lod@w3.org" <public-lod@w3.org>
Hi David,

I love it and I NEED it ;)
Awesome work, really.

I heard it will be opensource so I will probably be able to extend it
myself, but here are some ideas for (missing?) features:
* Importing custom Lookups/Dictionaries ( to go from text to IDs or
the other way around ). Maybe this is possible using a different hook
for the reconciliation mechanism.
* Related: Plug in other reconciliation services ( not sure how this
stands up to freebase biz alignment )
* Command line engine. To add a GW project as a step in a traditional
transformation job and execute steps sequentially.
* Expose Gazetteers ( dictionaries ) generated within the tool ( when
equating facets )

I have other ideas but I need to try it first it looks like you've
covered a lot of ground here.

Amazing, Amazing. Thanks!

On Sun, Mar 28, 2010 at 8:06 PM, David Huynh <dfhuynh@alum.mit.edu> wrote:
> On Mar/29/10 12:31 am, Kingsley Idehen wrote:
> All,
> A very nice data cleansing tool from David and Co. at Freebase.
> CSVs are clearly the dominant data format in the structured open data realm.
> This tool deals with ETL very well. Of course, for those who appreciate OWL,
> a lot of what's demonstrated in this demo is also achievable via "context
> rules". Bottom line (imho), nice tool that will only aid improving Web of
> Linked Data quality at the data set production stage.
> Links:
> 1. http://vimeo.com/10081183 -- Freebase Gridworks
> Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also
> demonstrates a few other interesting features:
>     http://www.vimeo.com/10287824
> David

Aldo Bucchi

This message is only for the use of the individual or entity to which it is
addressed and may contain information that is privileged and confidential. If
you are not the intended recipient, please do not distribute or copy this
communication, by e-mail or otherwise. Instead, please notify us immediately by
return e-mail.
Received on Monday, 29 March 2010 16:46:49 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:20:58 UTC