Re: Making human-friendly linked data pages more human-friendly from Paul A Houle on 2009-09-16 (public-lod@w3.org from September 2009)

From: Paul A Houle <devonianfarm@gmail.com>
Date: Wed, 16 Sep 2009 15:20:42 -0400
To: public-lod@w3.org
Message-ID: <ad20490909161220s6e5d720bnbc891e1bec09171d@mail.gmail.com>
   I think there are a few scenarios here.

   In my mind,  dbpedia.org is a site for tripleheads.  I use it all the
time when I'm trying to understand how my systems interact with data from
dbpedia -- for that purpose,  it's useful to see a reasonably formatted list
of triples associated with an item.  A view that's isomorphic to the triples
is useful for me there.

   Yes, better interfaces for browsing dbpedia/wikipedia ought to be built
-- navigation along axes of type,  time,  and space would be obviously
interesting,  but making a usable interface for this involves some
challenges which are outside the scope of dbpedia.org;  The point of linked
data is anybody who wants to make a better browsing interface for dbpedia.

   Another scenario is a site that's ~primarily~ a site for humans and
secondly a site for tripleheads and machines,  for instance,

http://carpictures.cc/

   That particular site is built on an object-relational system which has
some (internal) RDF features.  The site was created by merging dbpedia,
freebase and other information sources,  so it exports linked data that
links dbpedia concepts to images with very high precision.  The primary
vocabulary is SIOC,  and the RDF content for a page is ~nearly~ isomorphic
to the content of the main part of the page (excluding the sidebar.)

   However,  there is content that's currently exclusive to the human
interface:  for instance,  the UI is highly visual:  for every automobile
make and model,  there are heuristics that try to pick a "better than
average" image at being both striking and representative of the brand.  This
selection is materialized in the database.  There's information designed to
give humans an "information scent" to help them navigate,  a concept which
isn't so well-defined for webcrawlers.  Then there's the sidebar,  which has
several purposes,  one of them being a navigational system for humans,  that
just isn't so relevant for machines.

   There really are two scenarios I see for linked data users relative to
this system at the moment:  (i) a webcrawler crawls the whole site,  or (ii)
I provide a service that,  given a linked data URL,  returns information
about what ontology2 knows about the URL.  For instance,  this could be used
by a system that's looking for multimedia connected with anything in dbpedia
or freebase.  Perhaps I should be offering an NT dump of the whole site,
but I've got no interest in offering a SPARQL endpoint.

   As for friendly interfaces,  I'd say take a look analytically at a page
like

http://carpictures.cc/cars/photo/car_make/21/Chevrolet

   What's going on here?  This is being done on a SQL-derivative system that
has a query builder,  but you could do the same thing w/ SPARQL.  We'd image
that there are some predicates like

hasCarModel
hasPhotograph
hasPreferredThumb

   starting with a URL that represents a make of car (a nameplate,  like
Chevrolet) we'd then traverse the hasCarModel relationship to enumerate the
models,  and then do a COUNT(*) of hasPhotograph relationships for the cars
to create a count of pictures for each model.  Generically,  the
construction of a page like this involves doing "joins" and traversing the
graph to show,  not just the triples that are linked to a named entity,  but
information that can be found by traversing a graph.
People shouldn't be shy about introducing their own predicates;  the very
nature of inference in RDF points to "creating a new predicate" as the basic
solution to most problems.  In this case,  hasPreferredThumb is a perfectly
good way to materialize the result of a complex heuristic.

(One reason I'm sour about public SPARQL endpoints is that I don't want to
damage my brand by encouraging amnesic mashups of my content;  a quality
site really needs a copy of it's own data so it can make additions,
corrections,  etc;  one major shortcoming of Web 2.0 has been self-serving
API TOS that forbid systems from keeping a memory -- for instance,  Ebay
doesn't let you make a price tracker or a system that keeps dossiers on
sellers.  Del.icio.us makes it easy to put data in,  but you can't get
anything interesting out.  Web 3.0 has to make a clean break from this.)

Database-backed sites traditionally do this with a mixture of declarative
SQL code and procedural code to create a view...  It would be interesting to
see RDF systems where the graph traversal is specified and transformed into
a website declaritively.
Received on Thursday, 17 September 2009 07:40:42 UTC