Re: Nice Data Cleansing Tool Demo from Kingsley Idehen on 2010-03-29 (public-lod@w3.org from March 2010)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Mon, 29 Mar 2010 14:11:51 -0400
To: Georgi Kobilarov <georgi.kobilarov@gmx.de>
CC: David Huynh <dfhuynh@alum.mit.edu>, public-lod@w3.org
Message-ID: <4BB0ED67.8040207@openlinksw.com>
Georgi Kobilarov wrote:
> Hello,
>
>   
>>>> Now here is the obvious question, re. broader realm of faceted data
>>>> navigation, have you guys digested the underlying concepts
>>>> demonstrated by Microsoft Pivot?
>>>>
>>>>         
>>> I've seen the TED talk on Pivot. It's a very well polished
>>> implementation of faceted browsing. The Seadragon technology
>>> integration and animations are well executed. As far as "underlying
>>> concepts" in faceted browsing go, I haven't noticed anything novel
>>>       
> there.
>
> I agree with David here, nothing novel about the underlying concept. 
> One thing I found quite nice and haven't seen before is grouping results
> along one facet dimension (the bar-graph representation of results). I think
> that is a neat idea. 
>
> The integration of Seadragon and deep-zooming looks nice, but little more
> than that. 
> Not all objects render into nice pictures, and the interaction of zooming in
> and out isn't a helpful one in my opinion. The zooming gives the impression
> at first that the position of objects in that 2D space is meaningful, but it
> is not.  
> It's an eye-catcher, not more.
>
>
>   
>>> One thing to note: in each Pivot demo example, there is data of
>>> exactly one type only--say, type people. So it seems, using Microsoft
>>> Pivot, you can't pivot from one type to another, say, from people to
>>> their companies. You can't do that example I used for Parallax: US
>>> presidents -> children -> schools. Or skyscrapers -> architects ->
>>> other buildings. So from what I've seen, as it currently is, Microsoft
>>> Pivot cannot be used for browsing graphs because it cannot pivot (over
>>> graph links).
>>>       
>> Yes, this is a limitation re. general faceted browsing concepts.
>>     
>
> No, it's a limitation of the current implementations of faceted browsing.
> Not a general problem with faceted browsing.
>
>
>   
>> The most interesting part to me is the use of an alternative symbol
>> mechanism for the human interaction aspect i.e., deep zoom images where
>> you would typically see a long human unfriendly URI.
>>     
>
> "Where you would typically see URIs"? Really? 
>   
Where would you see URIs? What do you see when you use: 
http://lod.openlinksw.com ?

And when you don't see URIs (human or machine, the typical case re. 
Faceted Browsing over RDF) what do you have re. HTTP based Linked Data? 
Zilch!
>
>   
>>> Furthermore, I believe that to get Pivot to perform well, you need a
>>> cleaned up, *homogeneous* data set, presumably of small size (see
>>> their Wikipedia example in which they picked only the top 500 most
>>> visited articles). SW/linked data in their natural habitat, however,
>>> is rarely that cleaned up and homogeneous ... 
>>>       
>
> Is  that really a problem of Linked Data Web as such? I don't think so.
> There is a lot of badly structured, not well cleaned up data on the current
> Linked Data Web. Because there was so much excitement about publishing
> anything in the early day, and so little attention to the actual data that's
> getting published. That is going to change. 
>
>   
>>> So by the time you can
>>> use Pivot on SW/linked data, you will already have solved all the
>>> interesting and challenging problems.
>>>       
>> This part is what I call an innovation slot since we have hooked it into
>>     
> our
>   
>> DBMS hosted faceted engine and successfully used it over very large data
>> sets. 
>>     
>
> Kingsley, I'm wondering: How did you do that? I tried it myself, and it
> doesn't work. 
Did I indicate that my demo instance was public? How did you come to 
overlook that?
> Pivot can't make use of server-side faceted browsing engines.
>   
Why do you speculate? You are incorrect and Virtuoso do what you claim 
is impossible will be emphatic proof, nice and simple.

Pivot consumes data from HTTP accessible collections (which may be 
static or dynamic [1]). A dynamic collection is comprised of CXML 
resources (basically XML) .
> You need to send *all* the data to the Pivot client, and it computes the
> facets and performs any filtering operation client-side. 

You make a collection from a huge corpus of data (what I demonstrate) 
then you "Save As" (which I demonstrate as the generation point re. CXML 
resource) and then Pivot consumes. All the data is Virtuoso hosted.

There are two things you a overlooking:

1. The dynamic collection is produced at the conclusion of Virtuoso 
based faceted navigation (the interactions basically describes the Facet 
membership to Virtuoso)
2. Pivot works with static and dynamic collections

Virtuoso is an HTTP server, it can serve a myriad of  representations of 
data to user agents (it has its own DBMS hosted XSLT Processor and XML 
Schema Validator with XQuery/XPath to boot, all very old stuff).


BTW -- how do you think Peter Haase got his variant working? I am sure 
he will shed identical light on the matter for you.

Links:

1. http://www.getpivot.com/developer-info/ --- Please note Unbounded 
Dynamic Collections
2. http://www.getpivot.com/developer-info/hosting.aspx#Dynamic -- Look 
at the diagram then revist the architecture of Virtuoso (its a Hybrid 
Data Server that offers a plethora of functions in a single products, 
that's how it was architected from day 1)

Kingsley
> Works well for up
> to around 1k objects, but that's it. Pivot's architecture is in that sense
> very much like Exhibit in Silverlight.
>
>
> Best,
> Georgi
>
> --
> Georgi Kobilarov
> Uberblic Labs Berlin
> http://blog.georgikobilarov.com
>
>
>
>   


-- 

Regards,

Kingsley Idehen	      
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Monday, 29 March 2010 18:12:22 UTC