Re: Making human-friendly linked data pages more human-friendly from Kingsley Idehen on 2009-09-17 (public-lod@w3.org from September 2009)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 17 Sep 2009 07:23:02 -0400
To: Paul A Houle <devonianfarm@gmail.com>
CC: public-lod@w3.org
Message-ID: <4AB21C16.6000800@openlinksw.com>
Paul A Houle wrote:
>    I think there are a few scenarios here.
>
>    In my mind,  dbpedia.org <http://dbpedia.org> is a site for 
> tripleheads.  I use it all the time when I'm trying to understand how 
> my systems interact with data from dbpedia -- for that purpose,  it's 
> useful to see a reasonably formatted list of triples associated with 
> an item.  A view that's isomorphic to the triples is useful for me there.
>
>    Yes, better interfaces for browsing dbpedia/wikipedia ought to be 
> built -- navigation along axes of type,  time,  and space would be 
> obviously interesting,  but making a usable interface for this 
> involves some challenges which are outside the scope of dbpedia.org 
> <http://dbpedia.org>;  The point of linked data is anybody who wants 
> to make a better browsing interface for dbpedia.
>
>    Another scenario is a site that's ~primarily~ a site for humans and 
> secondly a site for tripleheads and machines,  for instance,
>
> http://carpictures.cc/
>
>    That particular site is built on an object-relational system which 
> has some (internal) RDF features.  The site was created by merging 
> dbpedia,  freebase and other information sources,  so it exports 
> linked data that links dbpedia concepts to images with very high 
> precision.  The primary vocabulary is SIOC,  and the RDF content for a 
> page is ~nearly~ isomorphic to the content of the main part of the 
> page (excluding the sidebar.)
>
>    However,  there is content that's currently exclusive to the human 
> interface:  for instance,  the UI is highly visual:  for every 
> automobile make and model,  there are heuristics that try to pick a 
> "better than average" image at being both striking and representative 
> of the brand.  This selection is materialized in the database.  
> There's information designed to give humans an "information scent" to 
> help them navigate,  a concept which isn't so well-defined for 
> webcrawlers.  Then there's the sidebar,  which has several purposes,  
> one of them being a navigational system for humans,  that just isn't 
> so relevant for machines.
>
>    There really are two scenarios I see for linked data users relative 
> to this system at the moment:  (i) a webcrawler crawls the whole 
> site,  or (ii) I provide a service that,  given a linked data URL,  
> returns information about what ontology2 knows about the URL.  For 
> instance,  this could be used by a system that's looking for 
> multimedia connected with anything in dbpedia or freebase.  Perhaps I 
> should be offering an NT dump of the whole site,  but I've got no 
> interest in offering a SPARQL endpoint.
>
>    As for friendly interfaces,  I'd say take a look analytically at a 
> page like
>
> http://carpictures.cc/cars/photo/car_make/21/Chevrolet
>
>    What's going on here?  This is being done on a SQL-derivative 
> system that has a query builder,  but you could do the same thing w/ 
> SPARQL.  We'd image that there are some predicates like
>
> hasCarModel
> hasPhotograph
> hasPreferredThumb
>
>    starting with a URL that represents a make of car (a nameplate,  
> like Chevrolet) we'd then traverse the hasCarModel relationship to 
> enumerate the models,  and then do a COUNT(*) of hasPhotograph 
> relationships for the cars to create a count of pictures for each 
> model.  Generically,  the construction of a page like this involves 
> doing "joins" and traversing the graph to show,  not just the triples 
> that are linked to a named entity,  but information that can be found 
> by traversing a graph.
> People shouldn't be shy about introducing their own predicates;  the 
> very nature of inference in RDF points to "creating a new predicate" 
> as the basic solution to most problems.  In this case,  
> hasPreferredThumb is a perfectly good way to materialize the result of 
> a complex heuristic.
>
> (One reason I'm sour about public SPARQL endpoints is that I don't 
> want to damage my brand by encouraging amnesic mashups of my content;  
> a quality site really needs a copy of it's own data so it can make 
> additions,  corrections,  etc;  one major shortcoming of Web 2.0 has 
> been self-serving API TOS that forbid systems from keeping a memory -- 
> for instance,  Ebay doesn't let you make a price tracker or a system 
> that keeps dossiers on sellers.  Del.icio.us <http://Del.icio.us> 
> makes it easy to put data in,  but you can't get anything interesting 
> out.  Web 3.0 has to make a clean break from this.)
>
> Database-backed sites traditionally do this with a mixture of 
> declarative SQL code and procedural code to create a view...  It would 
> be interesting to see RDF systems where the graph traversal is 
> specified and transformed into a website declaritively.
>
Paul,

A summary for the ages.

This is basically an aspect of the whole Linked Data meme that is lost 
on too many.

Thank you very much!!

-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Thursday, 17 September 2009 11:23:52 UTC