- From: Aldo Bucchi <aldo.bucchi@gmail.com>
- Date: Mon, 29 Mar 2010 12:46:16 -0400
- To: David Huynh <dfhuynh@alum.mit.edu>
- Cc: Kingsley Idehen <kidehen@openlinksw.com>, "public-lod@w3.org" <public-lod@w3.org>
Hi David, I love it and I NEED it ;) Awesome work, really. I heard it will be opensource so I will probably be able to extend it myself, but here are some ideas for (missing?) features: * Importing custom Lookups/Dictionaries ( to go from text to IDs or the other way around ). Maybe this is possible using a different hook for the reconciliation mechanism. * Related: Plug in other reconciliation services ( not sure how this stands up to freebase biz alignment ) * Command line engine. To add a GW project as a step in a traditional transformation job and execute steps sequentially. * Expose Gazetteers ( dictionaries ) generated within the tool ( when equating facets ) I have other ideas but I need to try it first it looks like you've covered a lot of ground here. Amazing, Amazing. Thanks! A On Sun, Mar 28, 2010 at 8:06 PM, David Huynh <dfhuynh@alum.mit.edu> wrote: > On Mar/29/10 12:31 am, Kingsley Idehen wrote: > > All, > > A very nice data cleansing tool from David and Co. at Freebase. > > CSVs are clearly the dominant data format in the structured open data realm. > This tool deals with ETL very well. Of course, for those who appreciate OWL, > a lot of what's demonstrated in this demo is also achievable via "context > rules". Bottom line (imho), nice tool that will only aid improving Web of > Linked Data quality at the data set production stage. > > Links: > > 1. http://vimeo.com/10081183 -- Freebase Gridworks > > Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also > demonstrates a few other interesting features: > > http://www.vimeo.com/10287824 > > David > -- Aldo Bucchi skype:aldo.bucchi http://www.univrz.com/ http://aldobucchi.com/ PRIVILEGED AND CONFIDENTIAL INFORMATION This message is only for the use of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If you are not the intended recipient, please do not distribute or copy this communication, by e-mail or otherwise. Instead, please notify us immediately by return e-mail.
Received on Monday, 29 March 2010 16:46:49 UTC