Re: Think before you write Semantic Web crawlers from Kingsley Idehen on 2011-06-22 (public-lod@w3.org from June 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 22 Jun 2011 20:10:39 +0100
To: public-lod@w3.org
Message-ID: <4E023E2F.9010800@openlinksw.com>

On 6/22/11 8:05 PM, Martin Hepp wrote:
> Glenn:
>
>> If there isn't, why not? We're the Semantic Web, dammit. If we aren't the masters of data interoperability, what are we?
> The main question is: Is the Semantic Web an evolutionary improvement of the Web, the Web understood as an ecosystem comprising protocols, data models, people, and economics - or is it a tiny special interest branch.
>
> As said: I bet a bottle of champagne that the academic Semantic Web community's technical proposals will never gain more than 10 % market share among "real" site-owners, because of
> - unnecessary complexity (think of the simplicity of publishing an HTML page vs. following LOD publishing principles),
> - bad design decisions (e.g explicit datatyping of data instances in RDFa),
> - poor documentation for non-geeks, and
> - a lack of understanding of the economics of technology diffusion.

Hoping you don't place WebID in the academic adventure bucket, right?

WebID, like URI abstraction, is well thought out critical infrastructure 
tech.

Kingsley
> Never ever.
>
> Best
>
> Martin
>
> On Jun 22, 2011, at 3:18 PM, glenn mcdonald wrote:
>
>> > From my perspective as the designer of a system that both consumes and publishes data, the load/burden issue here is not at all particular to the semantic web. Needle obeys robots.txt rules, but that's a small deal compared to the difficulty of extracting whole data from sites set up to deliver it only in tiny pieces. I'd say about 98% of the time I can describe the data I want from a site with a single conceptual query. Indeed, once I've got the data into Needle I can almost always actually produce that query. But on the source site, I usually can't, and thus we are forced to waste everybody's time navigating the machines through superfluous presentation rendering designed for people. 10-at-a-time results lists, interminable AJAX refreshes, animated DIV reveals, grafting back together the splintered bits of tree-traversals, etc. This is all absurdly unnecessary. Why is anybody having to "crawl" an open semantic-web dataset? Isn't there a "download" link, and/or a SPARQL endpoint? If there isn't, why not? We're the Semantic Web, dammit. If we aren't the masters of data interoperability, what are we?
>>
>> glenn
>> (www.needlebase.com)
>
>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Wednesday, 22 June 2011 19:11:04 UTC