Re: Nice Data Cleansing Tool Demo

David Huynh wrote:
> On Mar/29/10 10:01 am, Kingsley Idehen wrote:
>> David Huynh wrote:
>>> On Mar/29/10 12:31 am, Kingsley Idehen wrote:
>>>> All,
>>>>
>>>> A very nice data cleansing tool from David and Co. at Freebase.
>>>>
>>>> CSVs are clearly the dominant data format in the structured open 
>>>> data realm. This tool deals with ETL very well. Of course, for 
>>>> those who appreciate OWL, a lot of what's demonstrated in this demo 
>>>> is also achievable via "context rules". Bottom line (imho), nice 
>>>> tool that will only aid improving Web of Linked Data quality at the 
>>>> data set production stage.
>>>>
>>>> Links:
>>>>
>>>> 1. http://vimeo.com/10081183 -- Freebase Gridworks
>>>>
>>> Thanks, Kingsley. The second screencast, by Stefano Mazzocchi, also 
>>> demonstrates a few other interesting features:
>>>
>>>     http://www.vimeo.com/10287824
>>>
>>> David
>> David,
>>
>> Yes, very nice!
>>
>> Now here is the obvious question, re. broader realm of faceted data 
>> navigation, have you guys digested the underlying concepts 
>> demonstrated by Microsoft Pivot?
>>
>
> I've seen the TED talk on Pivot. It's a very well polished 
> implementation of faceted browsing. The Seadragon technology 
> integration and animations are well executed. As far as "underlying 
> concepts" in faceted browsing go, I haven't noticed anything novel there.
>
> One thing to note: in each Pivot demo example, there is data of 
> exactly one type only--say, type people. So it seems, using Microsoft 
> Pivot, you can't pivot from one type to another, say, from people to 
> their companies. You can't do that example I used for Parallax: US 
> presidents -> children -> schools. Or skyscrapers -> architects -> 
> other buildings. So from what I've seen, as it currently is, Microsoft 
> Pivot cannot be used for browsing graphs because it cannot pivot (over 
> graph links).
Yes, this is a limitation re. general faceted browsing concepts.


The most interesting part to me is the use of an alternative symbol 
mechanism for the human interaction aspect i.e., deep zoom images where 
you would typically see a long human unfriendly URI.
>
> Furthermore, I believe that to get Pivot to perform well, you need a 
> cleaned up, *homogeneous* data set, presumably of small size (see 
> their Wikipedia example in which they picked only the top 500 most 
> visited articles). SW/linked data in their natural habitat, however, 
> is rarely that cleaned up and homogeneous ... So by the time you can 
> use Pivot on SW/linked data, you will already have solved all the 
> interesting and challenging problems.
This part is what I call an innovation slot since we have hooked it into 
our DBMS hosted faceted engine and successfully used it over very large 
data sets. Of course it means that we've  implement some internal tweaks 
re. the alternative identifiers symbols, but once that was done, it was 
back to letting our engine do its thing re. huge data set navigation and 
the ability to expose Entity-Attribute-Value graph model based 
hypermedia resources in a variety of data representations (functionality 
that lies at the very core of Virtuoso)  etc..
>
> I do applaud their recent offering of the Pivot widget for embedding 
> into any arbitrary site. That should make faceted browsing more 
> accessible to web authors, as Exhibit has done. Pivot is way more 
> polished and hopefully scales better than Exhibit, although Exhibit is 
> more malleable as a piece of software.
Nice assessment :-)

We will soon unveil versions of our live instances (LOD Cloud Cache, 
DBpedia etc..) that work with Pivot as the client via dynamic 
collections. There is a fundamental feature in Virtuoso (what we call 
Anytime Query) that is essential to delivering this functionality. It is 
my hope that via Pivot (for which dynamic collections are extremely 
challenging) we can make comprehension a little clearer. What I describe 
is a general DBMS engine tweak (it goes beyond RDF data management).

Links:

1. http://www.youtube.com/watch?v=G29DBIEcIuQ -- a quick and dirty 
screencast I published post confirmation that our goals had been 
achieved re. huge RDF data sets navigation via Pivot

2. http://bit.ly/9mj7Fw -- old presentation covering our DBMS hosted 
faceted browser engine + Anytime Query feature for handling huge data 
sets at Web scale.


Kingsley

>
> David
>
>


-- 

Regards,

Kingsley Idehen	      
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 

Received on Monday, 29 March 2010 11:44:43 UTC