Re: How to add license information to a CreativeWork? / How to mark obscured informations? from Martin Hepp on 2012-02-27 (public-vocabs@w3.org from February 2012)

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Tue, 28 Feb 2012 00:47:37 +0100
To: Chuck <chuck42@gmx.de>
Cc: public-vocabs@w3.org
Message-Id: <AE0C6D36-E75F-4146-B467-3A1BF4262F01@ebusiness-unibw.org>
Hi Chuck:
On Feb 27, 2012, at 11:32 PM, Chuck wrote:

> Hello guys,
> 
> I want to use schema.org for my new Website and I have few questions:
> 
> ...

> (3) On my contact page I have obscured my email address with some
> JavaScript code to protect it from being crawled by spam bots. How can I
> use schema.org to mark this code as "hello web crawler, here is my email
> address, but you may not read it"? Is something like the following code
> correct?
> 
> <div itemscope itemtype="http://schema.org/Person">
>   <p>Hello, I'm <span itemprop="name">Chuck</span>.</p>
>   <p>
>      My email is
>      <span itemprop="email">
>         <script type="text/javascript"> document.write( ... ) </script>
>         <noscript><img src="my_email.png" /></noscript>
>      </span>
>   </p>
> </div>
> 
> [1] http://labs.creativecommons.org/2011/ccrel-guide/
> 
> Greetings,
> 
> Chuck


The quick answer to your point #3 is that you should simply not mark up those contents that you do not want search engines, non-standard crawlers, browser extensions, and mobile applications to extract. So simply omit the itemprop for "email" and mark up only those parts of your content for which you want to expose meta-data in a way friendly for extraction and reuse.

See also

http://wiki.goodrelations-vocabulary.org/Frequently_Asked_Questions#How_can_I_prevent_my_competitors_from_abusing_my_rich_markup.3F

for my argument on whether to mark up or not to mark up information:

Quote:

Q: How can I prevent my competitors from abusing my rich markup?

A: Some people are concerned that adding rich data markup makes their content more accessible to malicious crawlers and other ways of abuse of their data. In most cases, this fear is unfounded, since stealing content from a few known target sites via screen-scraping is pretty simple for a serious party, so you do not really help them a lot.
For your potential customers, however, rich data markup makes your information much more accessible, because they do not have the skills nor time to write a scraper script to extract your data.
In any case your malicious competior can cheaply hire freelancers to extract your content manually, e.g. via the Amazon Mechanical Turk platform. Your potential customers can't.
So by not providing data markup, you impede your potential customers much more than you impede abusers of your content.
On top of that, you can still use legal means to stop others from crawling and using your data. The simplest way is to exclude them using the robots.txt standard and technical ways of blocking bots and crawlers.

Note 1: There can be sophisticated business cases in which you may want to decide strategically on which data to make accessible to crawlers and intelligent clients, and which you don't.  In those cases, you will need to develop a data marketing strategy. But this is really for the big fishes, not for the small shops.

Note 2: Some business models rely on the time spent on the target page, i.e., they see a larger part of the user interaction taking place on Google or Bing as compared to on-site as a threat to their revenues. This line is not to defend in the long run. You should challenge your business model if user interaction time is a more important source of revenue for you than more and more qualified traffic on your Web pages.

Note 3: Don't block unknown crawlers from accessing your site by default. There are many promising startups that will try to access your data to get you new customers, even if you have never heard of them before. So be gracious, as long as the crawlers observe robots.txt and crawl politely.

Best

Martin

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/
Received on Monday, 27 February 2012 23:48:02 UTC