W3C home > Mailing lists > Public > public-lod@w3.org > March 2010

Contd: Nice Data Cleansing Tool Demo

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Mon, 29 Mar 2010 14:26:44 -0400
Message-ID: <4BB0F0E4.7010601@openlinksw.com>
To: "public-lod@w3.org" <public-lod@w3.org>, Georgi Kobilarov <georgi.kobilarov@gmx.de>
Georgi Kobilarov wrote:
> Hello,
>>>> Now here is the obvious question, re. broader realm of faceted data
>>>> navigation, have you guys digested the underlying concepts
>>>> demonstrated by Microsoft Pivot?
>>> I've seen the TED talk on Pivot. It's a very well polished
>>> implementation of faceted browsing. The Seadragon technology
>>> integration and animations are well executed. As far as "underlying
>>> concepts" in faceted browsing go, I haven't noticed anything novel
> there.
> I agree with David here, nothing novel about the underlying concept. 
> One thing I found quite nice and haven't seen before is grouping results
> along one facet dimension (the bar-graph representation of results). I 
> think
> that is a neat idea.
> The integration of Seadragon and deep-zooming looks nice, but little more
> than that. Not all objects render into nice pictures, and the 
> interaction of zooming in
> and out isn't a helpful one in my opinion. The zooming gives the 
> impression
> at first that the position of objects in that 2D space is meaningful, 
> but it
> is not.  It's an eye-catcher, not more.
>>> One thing to note: in each Pivot demo example, there is data of
>>> exactly one type only--say, type people. So it seems, using Microsoft
>>> Pivot, you can't pivot from one type to another, say, from people to
>>> their companies. You can't do that example I used for Parallax: US
>>> presidents -> children -> schools. Or skyscrapers -> architects ->
>>> other buildings. So from what I've seen, as it currently is, Microsoft
>>> Pivot cannot be used for browsing graphs because it cannot pivot (over
>>> graph links).
>> Yes, this is a limitation re. general faceted browsing concepts.
> No, it's a limitation of the current implementations of faceted browsing.
> Not a general problem with faceted browsing.
>> The most interesting part to me is the use of an alternative symbol
>> mechanism for the human interaction aspect i.e., deep zoom images where
>> you would typically see a long human unfriendly URI.
> "Where you would typically see URIs"? Really? 

**clean up post re. some critical typos **

Where would you see URIs? What do you see when you use: 
http://lod.openlinksw.com ?

And when you don't see URIs (human or machine, the typical case re. 
Faceted Browsing over RDF) what do you have re. HTTP based Linked Data? 
>>> Furthermore, I believe that to get Pivot to perform well, you need a
>>> cleaned up, *homogeneous* data set, presumably of small size (see
>>> their Wikipedia example in which they picked only the top 500 most
>>> visited articles). SW/linked data in their natural habitat, however,
>>> is rarely that cleaned up and homogeneous ...       
> Is  that really a problem of Linked Data Web as such? I don't think so.
> There is a lot of badly structured, not well cleaned up data on the 
> current
> Linked Data Web. Because there was so much excitement about publishing
> anything in the early day, and so little attention to the actual data 
> that's
> getting published. That is going to change.
>>> So by the time you can
>>> use Pivot on SW/linked data, you will already have solved all the
>>> interesting and challenging problems.
>> This part is what I call an innovation slot since we have hooked it into
> our
>> DBMS hosted faceted engine and successfully used it over very large data
>> sets.     
> Kingsley, I'm wondering: How did you do that? I tried it myself, and it
> doesn't work.

Did I indicate that my demo instance was public? How did you come to 
overlook that?

> Pivot can't make use of server-side faceted browsing engines.

Why do you speculate? You are incorrect and Virtuoso *doing* what you 
claim is impossible will be emphatic proof, nice and simple.

Pivot consumes data from HTTP accessible collections (which may be 
static or dynamic [1]). A dynamic collection is comprised of CXML 
resources (basically XML) .

> You need to send *all* the data to the Pivot client, and it computes the
> facets and performs any filtering operation client-side. 

You make a collection from a huge corpus of data (what I demonstrate) 
then you "Save As" (which I demonstrate as the generation point re. CXML 
resource) and then Pivot consumes. All the data is Virtuoso hosted.

There are two things you are overlooking:

1. The dynamic collection is produced at the conclusion of Virtuoso 
based faceted navigation (the interactions basically describes the Facet 
membership to Virtuoso)
2. Pivot works with static and dynamic collections .

*I specifically state, this is about using both products together to 
solve a major problem. #1 Faceted Browsing UX #2 Faceting over a huge 
data corpus.*

Virtuoso is an HTTP server, it can serve a myriad of representations of 
data to user agents (it has its own DBMS hosted XSLT Processor and XML 
Schema Validator with XQuery/XPath to boot, all very old stuff).

BTW -- how do you think Peter Haase got his variant working? I am sure 
he will shed identical light on the matter for you.


1. http://www.getpivot.com/developer-info/ --- Please note Unbounded 
Dynamic Collections
2. http://www.getpivot.com/developer-info/hosting.aspx#Dynamic -- Look 
at the diagram then revist the architecture of Virtuoso (its a Hybrid 
Data Server that offers a plethora of functions in a single product, 
that's how it was architected from day 1)

> Works well for up
> to around 1k objects, but that's it. Pivot's architecture is in that 
> sense
> very much like Exhibit in Silverlight.
> Best,
> Georgi
> -- 
> Georgi Kobilarov
> Uberblic Labs Berlin
> http://blog.georgikobilarov.com



Kingsley Idehen	      
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 
Received on Monday, 29 March 2010 18:27:12 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:16:04 UTC