Re: Updated LOD Cloud Diagram - Missed data sources.

Hello Kingsley,

On Fri, Jul 25, 2014 at 05:47:58PM -0400, Kingsley Idehen wrote:
> When you have a sense of the identity of an Agent and on behalf of whom it
> is operating, you can use RDF based Linked Data to construct and enforce
> usage policies.

<sarcasm>
Yes. Every "Agent" that does not use WebID-TLS supporting every possible
RDF serialization and every access ontology that comes to mind does not
deserve that name.
</sarcasm>

Seriously: It's funny that Charlie Stross - one of my favorite Science 
Fiction authors - was involved in the creation of the robots exclusion
standard.

But the "standard" is really a proprietary mess. Even basic things like
"Crawl-Delay" are extensions introduced and supported by some vendors.
Many current robots.txt libraries only check for allowed/forbidden and do not
support parsing/returning such options.

For starters, we need:

-Current extensions made official

-A means to exclude fragments of a [HTML] document for indexing

-A Noindex HTTP Header to selectively exclude content from indexing without
 bloating the robots.txt (there is an inofficial x-robots-tag supported by
 Google and Bing)

The former two would alleviate problems with the "right to be forgotten".

And possibly something to distinguish occasional Agents from recursively
crawling bots.

My current interpretation of robots.txt is that it forbids every access not
directly caused/mediated by a human.

Regards,

Michael Brunnbauer

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel

Received on Friday, 25 July 2014 22:31:46 UTC