AW: Contd: Nice Data Cleansing Tool Demo from Peter Haase on 2010-03-29 (public-lod@w3.org from March 2010)

From: Peter Haase <peter.haase@fluidops.com>
Date: Mon, 29 Mar 2010 23:11:27 +0200
To: "'Kingsley Idehen'" <kidehen@openlinksw.com>, <public-lod@w3.org>, "'Georgi Kobilarov'" <georgi.kobilarov@gmx.de>
Message-ID: <4bb11781.0305560a.20ab.292a@mx.google.com>
Hi,

> -----Ursprüngliche Nachricht-----
> Von: public-lod-request@w3.org [mailto:public-lod-request@w3.org] Im
> Auftrag von Kingsley Idehen
> Gesendet: Monday, March 29, 2010 8:27 PM
> An: public-lod@w3.org; Georgi Kobilarov
> Betreff: Contd: Nice Data Cleansing Tool Demo
> 
> Georgi Kobilarov wrote:
> > Hello,
> >
> >
> >>>> Now here is the obvious question, re. broader realm of faceted
> data
> >>>> navigation, have you guys digested the underlying concepts
> >>>> demonstrated by Microsoft Pivot?
> >>>>
> >>>>
> >>> I've seen the TED talk on Pivot. It's a very well polished
> >>> implementation of faceted browsing. The Seadragon technology
> >>> integration and animations are well executed. As far as "underlying
> >>> concepts" in faceted browsing go, I haven't noticed anything novel
> >>>
> > there.
> >
> > I agree with David here, nothing novel about the underlying concept.
> > One thing I found quite nice and haven't seen before is grouping
> results
> > along one facet dimension (the bar-graph representation of results).
> I
> > think
> > that is a neat idea.
> > The integration of Seadragon and deep-zooming looks nice, but little
> more
> > than that. Not all objects render into nice pictures, and the
> > interaction of zooming in
> > and out isn't a helpful one in my opinion. The zooming gives the
> > impression
> > at first that the position of objects in that 2D space is meaningful,
> > but it
> > is not.  It's an eye-catcher, not more.
> >
> >
> >
> >>> One thing to note: in each Pivot demo example, there is data of
> >>> exactly one type only--say, type people. So it seems, using
> Microsoft
> >>> Pivot, you can't pivot from one type to another, say, from people
> to
> >>> their companies. You can't do that example I used for Parallax: US
> >>> presidents -> children -> schools. Or skyscrapers -> architects ->
> >>> other buildings. So from what I've seen, as it currently is,
> Microsoft
> >>> Pivot cannot be used for browsing graphs because it cannot pivot
> (over
> >>> graph links).
> >>>
> >> Yes, this is a limitation re. general faceted browsing concepts.
> >>
> >
> > No, it's a limitation of the current implementations of faceted
> browsing.
> > Not a general problem with faceted browsing.
> >

Using dynamic collection you can essentially implement any pivot/query
refinement/filter operator you like, including the ones mentioned above.
It is true that the demo collections from Microsoft do not show this (yet),
but we have some of them in our system at
http://iwb.fluidops.com/pivot


> >
> >> The most interesting part to me is the use of an alternative symbol
> >> mechanism for the human interaction aspect i.e., deep zoom images
> where
> >> you would typically see a long human unfriendly URI.
> >>
> >
> > "Where you would typically see URIs"? Really?
> 
> **clean up post re. some critical typos **
> 
> Where would you see URIs? What do you see when you use:
> http://lod.openlinksw.com ?
> 
> And when you don't see URIs (human or machine, the typical case re.
> Faceted Browsing over RDF) what do you have re. HTTP based Linked Data?
> Zilch!
> >
> >
> >>> Furthermore, I believe that to get Pivot to perform well, you need
> a
> >>> cleaned up, *homogeneous* data set, presumably of small size (see
> >>> their Wikipedia example in which they picked only the top 500 most
> >>> visited articles). SW/linked data in their natural habitat,
> however,
> >>> is rarely that cleaned up and homogeneous ...

Yes, ideally you have clean homogeneous data. However, in our demonstrator
we do operate on a larger, un-cleaned LOD data set, incl. DBpedia (>3Mio
entities) and several others (around 200Mio triples in total). Clearly, you
see the problems in the data (missing images, wrong images, duplicate
values, ...) Still, I see it from a positive side: I believe that for many
information needs, visual exploration is a very effective paradigm, and with
such a great tool like Pivot one can achieve a phenomenal user experience.
And it is possible to show that with real LOD data already today. 
As Georgi said, the data quality will improve over time. Visual exploration
tools like Pivot - where you actually *see* the problems - might help on
this front.



> > Is  that really a problem of Linked Data Web as such? I don't think
> so.
> > There is a lot of badly structured, not well cleaned up data on the
> > current
> > Linked Data Web. Because there was so much excitement about
> publishing
> > anything in the early day, and so little attention to the actual data
> > that's
> > getting published. That is going to change.
> >
> >>> So by the time you can
> >>> use Pivot on SW/linked data, you will already have solved all the
> >>> interesting and challenging problems.
> >>>
> >> This part is what I call an innovation slot since we have hooked it
> into
> >>
> > our
> >
> >> DBMS hosted faceted engine and successfully used it over very large
> data
> >> sets.
> >
> > Kingsley, I'm wondering: How did you do that? I tried it myself, and
> it
> > doesn't work.
> 
> Did I indicate that my demo instance was public? How did you come to
> overlook that?
> 
> > Pivot can't make use of server-side faceted browsing engines.
> >
> 
> Why do you speculate? You are incorrect and Virtuoso *doing* what you
> claim is impossible will be emphatic proof, nice and simple.
> 
> Pivot consumes data from HTTP accessible collections (which may be
> static or dynamic [1]). A dynamic collection is comprised of CXML
> resources (basically XML) .
> 
> > You need to send *all* the data to the Pivot client, and it computes
> the
> > facets and performs any filtering operation client-side.
> 
> You make a collection from a huge corpus of data (what I demonstrate)
> then you "Save As" (which I demonstrate as the generation point re.
> CXML
> resource) and then Pivot consumes. All the data is Virtuoso hosted.
> 
> There are two things you are overlooking:
> 
> 1. The dynamic collection is produced at the conclusion of Virtuoso
> based faceted navigation (the interactions basically describes the
> Facet
> membership to Virtuoso)
> 2. Pivot works with static and dynamic collections .
> 
> *I specifically state, this is about using both products together to
> solve a major problem. #1 Faceted Browsing UX #2 Faceting over a huge
> data corpus.*
> 
> Virtuoso is an HTTP server, it can serve a myriad of representations of
> data to user agents (it has its own DBMS hosted XSLT Processor and XML
> Schema Validator with XQuery/XPath to boot, all very old stuff).
> 
> 
> BTW -- how do you think Peter Haase got his variant working? I am sure
> he will shed identical light on the matter for you.


Of course you are both right;-)
With dynamic collections you can use server side facet engines when
computing the collection (that's what we do), but for a given collection you
have to send all the data to the client at once, no matter whether it is
dynamically computed or not. If the result set is large (>>1k items), you
have a problem. What is needed is Top-k plus the right pivot/refinement
operators (which link to new dynamic collections). 

Regards,
Peter



> 
> Links:
> 
> 1. http://www.getpivot.com/developer-info/ --- Please note Unbounded
> Dynamic Collections
> 2. http://www.getpivot.com/developer-info/hosting.aspx#Dynamic -- Look
> at the diagram then revist the architecture of Virtuoso (its a Hybrid
> Data Server that offers a plethora of functions in a single product,
> that's how it was architected from day 1)
> 
> > Works well for up
> > to around 1k objects, but that's it. Pivot's architecture is in that
> > sense
> > very much like Exhibit in Silverlight.
> >
> >
> > Best,
> > Georgi
> >
> > --
> > Georgi Kobilarov
> > Uberblic Labs Berlin
> > http://blog.georgikobilarov.com
> >
> >
> >
> 
> 
> 
> 
> --
> 
> Regards,
> 
> Kingsley Idehen
> President & CEO
> OpenLink Software
> Web: http://www.openlinksw.com
> Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca: kidehen
> 
> 
> 
> 
>
Received on Monday, 29 March 2010 21:12:00 UTC