Re: Linked Data Dogfood circa. 2013

On 1/4/13 4:02 PM, Giovanni Tummarello wrote:
> One might just simply stay silent and move along, but i take a few
> seconds to restate the obvious.
>
> It is a fact that Linked data as " publish some stuff and they will
> come, both new publishers and consumers" has failed.

Of course it hasn't. How have we (this community) arrived at a LOD Cloud 
way in excess of 50 Billion+ useful triples? I just can't accept this 
kind of dismissal, it has adverse effects on the hard work of many that 
are continuously contributing to the LOD Cloud effort.

>
> The idea of putting some "extra energy" would simply be useless per se
> BUT it becomes  wrong when one tries to involve others e.g. gullible
> newcomers,  fresh ph.d students who trust that "hey if my ph.d advisor
> made a career out of it, and EU gave him so much money it must be real
> right?"

Teach people how to make little bits of Linked Data in Turtle. The RDBMS 
world successfully taught people how to make Tables and execute simple 
queries using SQL, in the ultimate data silos i.e., RDBMS engines. The 
same rules apply here with the advantage of a much more powerful, open, 
and ultimately useful language in SPARQL. In addition to that, you have 
a superior data source name (DSN) mechanism in HTTP URIs, and superior 
Data Access that's all baked into HTTP.

Last year I ensured every employee at OpenLink could write Turtle by 
hand. They all performed a basic exercise [1][2]: describe the yourself 
and/or stuff you like. The process started slow and ended with everyone 
having a lot of fun.

Simple message to get folks to engage: if you know illiteracy leads to 
competitive disadvantage in the physical (real) world, why accept 
illiteracy in the ultra competitive digital realm of the Web? Basically, 
if you can write simple sentences in natural language, why not learn to 
do the same with the Web realm in mind?  Why take the distracting 
journey of producing an HTML file when you can dump content such as what 
follows into a file?

## Turtle Start ##

<> a <#Document> .
<> <#topic> <#i> .
<#i> <#name> "Kingsley Idehen" .
<#i> <#nickname> "@kidehen" .

## Bonus bits: Cross References to properties defined by existing 
vocabularies
## In more serious exercises this section would be where DBpedia and 
other LOD cloud URIs kick-in.

<#name> owl:equivalentProperty foaf:name .
<#topic> owl:equivalentProperty foaf:topic .
<#nickname> owl:equivalentClass foaf:nick .
<#Document> owl:equivalentClass foaf:Document .
<#i> owl:sameAs 
<http://kingsley.idehen.net/dataspace/person/kidehen#this> .

## Turtle End ##

Don't underestimate the the power of human intelligence, once awakened 
:-) The above is trivial for any literate person to comprehend. 
Remember, they already understand natural language sentence structure 
expressed in: subject->predicate->object or subject->verb-object form.

>
> IAs community of people who claim to have something to do with
> research (and not a cult) every once in a while is learn from the
> above lesson and devise NEW methods and strategies.

Yes, and the lesson we've learned over the years is that premature 
optimization is suboptimal when dealing with Linked Data. Basically, you 
have to teach Linked Data using manual document production steps i.e., 
remind them of the document create and share pattern. Once this is 
achieved, they'll immediately realize there's a lot of fun to being able 
to represent structured data with ease, but at the expense of limited 
free time -- the very point when productivity oriented tools and 
services come into play.

> In other words,
> move ahead in a smart way.

Yes, but there isn't one smart way. For we humans the quest is always 
rife with context fluidity. Thus, "horses for courses" rule always 
applies. No silver bullets.

>
> I am by no mean trowing all away.

Good!

>
> * publishing structured data on the web is already a *huge thing* with
> schema.org and the rest.

Yes, but that's a useful piece of the picture. Not the picture. 
HTML+Microdata and (X)HTML+RDFa are not for end-users. Turtle is for 
end-users, so it too has to be part of the mix when the target audience 
is end-users.

> Why? because of the clear incentive SEO.

SEO is only a piece of the picture. Yes, everyone wants to be discovered 
by Google, for now, but that isn't the Web's ultimate destiny. What 
people really want is serendipitous discovery of relevant information as 
an intrinsic component of the virtuous cycle associated with content 
sharing via the Web.

> * RDF is a great model for heterogeneous data integration and i think
> it will explode in (certain) enterprises (knowledge intensive)

RDF provides specific benefits lost in a warped narrative. It USP boils 
down to endowing entity relationship model based structured data with 
*explicit* and *fine-grained* machine and human comprehensible entity 
relationship semantics. It improves upon the basic entity relationship 
model where entity relationship semantics are *implicit* and 
*coarse-grained*.

The awkward paragraph above has been long understood by a majority of 
the DBMS developers and end-users outside the Semantic Web and Linked 
Data communities. It just gets very confusing once the letters R-D-F 
come into the mix due to the provincial nature of many of its older 
narratives.

Note: the new work by the RDF working group has solved the issue above 
i.e., they've done an *amazing* job fixing many of the issues that have 
dogged RDF narratives of yore.

>
> What we're seeking here is more advanced, flexible uses of structured
> data published, e.g. by smart clients, that do useful things for
> people.

Yes, and one simple way to get users engaged is by showing them that 
they can put one or two sentences in a document, publish the document, 
and start follow-your-nose exploration. All at the fraction of the time 
cost it takes to achieve the same thing using HTML.

> The key is to show these clients, these useful things. What other
> (realistic) incentive can we create that make people publish data?

You show them how powerful entity oriented analytics [3] can be 
performed against this data. Basically, Business Intelligence ++ .

>   how
> would a real "linked data client" work and provide benefit to a real
> world, non academic example class of users (if not all?) .

See my comment above, and digest the links I reference. Note, I am not 
speculating, I have customers who are exploiting these patterns right 
now. Teach folks Turtle and you will simplify value proposition 
articulation and appreciation.

>
> my wish for 2013 about linked data is that the discussion focuses on
> this.

Yes, productive use of Linked Data. That doesn't mean we don't dogfood. 
Every demo I make is a dog-fooding exercise.

> With people concentrated on the "full circle, round trip"
> experience, with incentives for all (and how to start the virtuous
> circle).

We just need to teach people how to publish documents with high 
Serendipitous Discovery Quotient (SDQ) [4], SEO's days are numbered :-)

Links:

1. http://bit.ly/QlQJLP -- Describing Stuff I Like using a Turtle Document.
2. http://bit.ly/RJzd9S -- Why Turtle Matters.
3. http://bit.ly/VAgjlx -- LOD Cloud Analytics based on Job Postings 
(snapshots) from LinkedIn.
4. https://plus.google.com/s/%23SDQ%20idehen -- SDQ related posts on G+ .
5. http://bit.ly/UqyqZa -- LOD Cloud Analytics based on Entity of type: 
Book  (basically an analysis of Worldcat and related data about Books).
6. http://bit.ly/RCKbts -- LOD Cloud exploitation via ODBC compliant 
applications (*this is one the enterprises easily understand, they all 
use ODBC or JDBC for RDBMS data access).
7. http://bit.ly/QhGBXY -- LOD Cloud exploitation via Google Spreadsheet .
8. http://bit.ly/NP8uWv -- LOD Cloud exploitation via Microsoft Excel 
Spreadsheet.


Kingsley
>
> Gio
>
>
> On Fri, Jan 4, 2013 at 2:03 PM, William Waites <ww@styx.org> wrote:
>> hmmm.... not so tasty:
>>
>>      warning: array_keys() [function.array-keys]: The first argument should
>>      be an array in
>>      /var/www/drupal-6.22/sites/all/modules/dogfood/dogfood.module on
>>      line 1807.
>>
>> digging deeper:
>>
>>      The proxy server received an invalid response from an upstream server.
>>      The proxy server could not handle the request POST /sparql.
>>
>>      Reason: DNS lookup failure for: data.semanticweb.org
>>
>>      Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 PHP/5.2.0-8+etch16 mod_ssl/2.2.3
>>      OpenSSL/0.9.8c Server at data.semanticweb.org Port 80
>>
>> (appears to be a reverse proxy at data.semanticweb.org)
>>
>> I think I prefer people food...
>>
>> Cheers,
>> -w
>>
>>
>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Received on Friday, 4 January 2013 23:14:41 UTC