Re: Making human-friendly linked data pages more human-friendly from Kingsley Idehen on 2009-09-17 (public-lod@w3.org from September 2009)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 17 Sep 2009 12:19:28 -0400
To: Paul A Houle <devonianfarm@gmail.com>
CC: public-lod@w3.org
Message-ID: <4AB26190.8090400@openlinksw.com>
Paul A Houle wrote:
>
>
> On Thu, Sep 17, 2009 at 7:23 AM, Kingsley Idehen 
> <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote:
>
>
>
>     This is basically an aspect of the whole Linked Data meme that is
>     lost on too many.
>
>
> I've got to thank the book by Allemang and Hendler
>
> http://www.amazon.com/Semantic-Web-Working-Ontologist-Effective/dp/0123735564 
>
>
> for setting me straight about data modeling in RDF.  RDFS and OWL are 
> based on a system of duck typing that turns conventional object or 
> object-relational thinking inside out.  It's not necessarily good or 
> bad,  but it's really different.  Even though types matter,  
> predicates come before types because using predicate A can make object 
> B become a member of type C,  even if A is never explicitly put in 
> class C.
Schema Last vs. Schema First :-) An RDF virtue that once broadly 
understood, across the more traditional DBMS realms, will work wonders 
for RDF based Linked Data appreciation.
>
> Looking at the predicates in RDFS or OWL and not understanding the 
> whole,  it's pretty easy to be like "oh,  this isn't too different 
> from a relational database" and miss the point that RDFS&OWL is much 
> more about inference (creating new triples) than it is about 
> constraints or the physical layout of the data.
Its about a concrete conceptual layer that isn't autistic to context. In 
some quarters this is actually called a: Context Model Database [1].
>
> One consequence of this is that using an existing predicate can drag 
> in a lot more baggage than you might want;  it's pretty easy to get 
> the inference engine to infer too much,  and false inferences can 
> snowball like a katamari.
Yes, but the katamari can be confined to a specific data space that is 
owned and controlled by a particular person, who has a specific world 
view. As long as the axioms are partitioned across data spaces, and the 
RDF store is capable of processing within said confines, everyone is 
happy. Trouble starts when the claims become global facts imposed on 
everyone else that has access to the data space.
>
> A lot of people are in the habit of reusing vocabularies and seem to 
> forget that the natural answer to most RDF modeling problems is to 
> create a new predicate.  OWL has a rich set of mechanisms that can 
> tell systems that
>
> x A y -> x B y
> where A is your new predicate and B is a well-known predicate.  Once 
> you merge two "almost-but-not-the-same" things by actually using the 
> same predicate,  it's very hard to fix the damage.  If you use 
> inference,  it's easy to change your mind.
Yep!  The trouble is that OWL-appreciation is low, but ultimately, this 
is where the magic really lies. This is how URIs (Data Source Names) 
will be distinguished based on the data highway smarts they expose etc.. 
Basically, I am traveling from Boston to Detroit, which route (amongst 
many) gets me there quickest, based on my specific preferences etc..
>
> --------------
>
> It may be different with other data sets,  but data cleaning is 
> absolutely essential working with dbpedia if you want to make 
> production-quality systems.
Data cleansing is required because there are no abosolute truths and we 
all see the same thing differently. What RDF facilitates, above all 
else, is its ability to protect our natural tendencies (seeing same 
things differently) by inverting the tradition model where inertia is 
introduced as a result of different views or perspectives.

Heterogeneity is the spice of life for a reason. Even our DNA rewards us 
when we fuse afar (rather than inbreed) etc. :-)
>
> For instance,  all of the time people build bizapps and they need a 
> list of US states...  Usually we go and cut and paste one from 
> somewhere...  But now I've got dbpedia and I should be able to do this 
> systematically.  There's a category in wikipedia for that...
>
> http://en.wikipedia.org/wiki/Category:States_of_the_United_States
>
> if you ignore the subcategories and just take the actual pages,  it's 
> (almost) what you need,  except for some weirdos like
>
> User:Beebarose/Alabama 
> <http://en.wikipedia.org/wiki/User:Beebarose/Alabama>
>
> and one state that's got a disambiguator in the name:
>
> Georgia (U.S. state) 
> <http://en.wikipedia.org/wiki/Georgia_%28U.S._state%29>
>
> It's not hard to clean up this list,  but it takes some effort,  and 
> ultimately you're probably going to materialize something new.
Yes, something new, in a new data space that is still plugged into the Web.
>
> These sorts of issues even turn up in highly clean data sets.  Once I 
> built a webapp that had a list of countries in it,  this was used to 
> draw a dropdown list,  but the dropdown list was excessively wide,  
> busting the layout of the site.  Now,  the list was really long 
> because there were a few authoritarian countries with long and flowery 
> names.  The transformation from
>
> *Democratic People's Republic of Korea -> North Korea
>
> *improved the usability of the site while eliminating Orwellian 
> language.  This kind of "fit and finish" is needed to make quality 
> sites,  and semweb systems are going to need automated and manual ways 
> of fixing this so that "Web 3.0" looks like a step forward,  not a 
> step back.

Web 3.0 is a step forward, but we need to know where the step is :-) As 
you know, It ain't about code, its about data structures combines with 
ubiquitous access and reference.


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Thursday, 17 September 2009 16:20:05 UTC