Re: Think before you write Semantic Web crawlers

On 6/23/11 9:09 AM, Martin Hepp wrote:
> Yes, WebID is out of question a good thing. I am not entirely sure, though, that you can make it a mandatory requirement for access to your site, because if a few major consumers do not use WebID for their crawlers, site-owners cannot block anonymous crawlers.

You can make it part of your QoS and ACL based framework. Basically 
saying: if you want more (total records, deeper navigation etc..) please 
identify yourself using a verifiable ID :-)

Eventually, an agent will be challenged, go off an get a WebID, return, 
get better QoS, even ask for more, get presented with a quote, pay, and 
continue. Remember that old Information Super Highway vision, well, its 
now finally taking shape with intelligent agents inflection round the 
corner. The decoupling of information and data, as exemplified by Linked 
Data makes this possible, in a big way.

Ironically, the use of WebID might actually become a Linked Data vector 
since the biggest InterWeb headache remains verifiable identity. Thus 
far, the following have been rendered borderline useless courtesy of 
unverifiable identity:

1. Email
2. Pingbacks
3. Comments.

WebID provides a viable ingredient for fixing all of the above and more. 
Fix those and Linked Data generation becomes viral since most existing 
apps and services will evolve naturally into Linked Data generators and 
consumers.

Web 2.0 stalled (a long time ago) as a result of not having an AWWW 
based solution for verifiable identity.



Kingsley
> On Jun 22, 2011, at 9:10 PM, Kingsley Idehen wrote:
>
>> On 6/22/11 8:05 PM, Martin Hepp wrote:
>>> Glenn:
>>>
>>>> If there isn't, why not? We're the Semantic Web, dammit. If we aren't the masters of data interoperability, what are we?
>>> The main question is: Is the Semantic Web an evolutionary improvement of the Web, the Web understood as an ecosystem comprising protocols, data models, people, and economics - or is it a tiny special interest branch.
>>>
>>> As said: I bet a bottle of champagne that the academic Semantic Web community's technical proposals will never gain more than 10 % market share among "real" site-owners, because of
>>> - unnecessary complexity (think of the simplicity of publishing an HTML page vs. following LOD publishing principles),
>>> - bad design decisions (e.g explicit datatyping of data instances in RDFa),
>>> - poor documentation for non-geeks, and
>>> - a lack of understanding of the economics of technology diffusion.
>> Hoping you don't place WebID in the academic adventure bucket, right?
>>
>> WebID, like URI abstraction, is well thought out critical infrastructure tech.
>>
>> Kingsley
>>> Never ever.
>>>
>>> Best
>>>
>>> Martin
>>>
>>> On Jun 22, 2011, at 3:18 PM, glenn mcdonald wrote:
>>>
>>>>>  From my perspective as the designer of a system that both consumes and publishes data, the load/burden issue here is not at all particular to the semantic web. Needle obeys robots.txt rules, but that's a small deal compared to the difficulty of extracting whole data from sites set up to deliver it only in tiny pieces. I'd say about 98% of the time I can describe the data I want from a site with a single conceptual query. Indeed, once I've got the data into Needle I can almost always actually produce that query. But on the source site, I usually can't, and thus we are forced to waste everybody's time navigating the machines through superfluous presentation rendering designed for people. 10-at-a-time results lists, interminable AJAX refreshes, animated DIV reveals, grafting back together the splintered bits of tree-traversals, etc. This is all absurdly unnecessary. Why is anybody having to "crawl" an open semantic-web dataset? Isn't there a "download" link, and/or a SPARQL endpoint? If there isn't, why not? We're the Semantic Web, dammit. If we aren't the masters of data interoperability, what are we?
>>>> glenn
>>>> (www.needlebase.com)
>>>
>>
>> -- 
>>
>> Regards,
>>
>> Kingsley Idehen	
>> President&   CEO
>> OpenLink Software
>> Web: http://www.openlinksw.com
>> Weblog: http://www.openlinksw.com/blog/~kidehen
>> Twitter/Identi.ca: kidehen
>>
>>
>>
>>
>>
>>
>>
>


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Received on Thursday, 23 June 2011 08:34:10 UTC