Re: Ease of adoption from Martin Hepp on 2013-07-29 (public-vocabs@w3.org from July 2013)

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Mon, 29 Jul 2013 19:26:06 +0200
To: "Dawson, Laura" <Laura.Dawson@bowker.com>
Cc: Wes Turner <wes.turner@gmail.com>, Dave Pawson <dave.pawson@gmail.com>, "public-vocabs@w3.org" <public-vocabs@w3.org>, Dan Brickley <danbri@google.com>
Message-Id: <EA81FAA5-6E37-46A2-B477-0855DCA4356A@ebusiness-unibw.org>
Thanks, you are very welcome - yes, I understand, books are different - but the basic pattern is the same: You can never win by making access to (selected parts of) your content less machine-friendly. That is like not putting price-tags on products to escape from price-comparison. It may work for a short while in selected segments, but it won't allow survival of an otherwise inferior business model or deficient operations. (Note: I am not saying that producing books is per se a bad business model ;-))

Martin

PS: Side-story: I once read in the disclaimers of a small German winery's Web site that "external linking to this site without written consent was forbidden" ;-)


On Jul 29, 2013, at 7:14 PM, Dawson, Laura wrote:

> This is excellent - of course, book publishers just don't think this way.
> Thank you for this!!!
> 
> On 7/29/13 1:12 PM, "Martin Hepp" <martin.hepp@ebusiness-unibw.org> wrote:
> 
>> Hi Dawson:
>> 
>> I also have a common reply to the concern raised by site-owners that rich
>> data markup makes it easier for your competitors to abuse your content.
>> 
>> "But schema.org will make it easy for my competitors to harvest my
>> prices, product descriptions, or dealer network information!"
>> 
>> First, most site-owners do not realize how easy it is as of today for
>> anybody to extract content from others' Web sites via crowdsourcing:
>> Assumed your competitor wants an Excel table with all your dealers, their
>> addresses, and opening hours. With services like Amazon Mechanical Turk
>> or CrowdFlower, it will be a job of 15 Minutes and 50 USD or less to hire
>> human labor to extract that information for you, including reformatting,
>> spell-check, etc. Even a small competitor of yours can take that effort
>> if seriously interested. And it is actually more expensive to operate a
>> decent Web crawler for structured data than that. However, your
>> prospective clients will likely neither spend the time nor money to
>> access your data that way.
>> 
>> It it true that structured data simplifies the access to and use of the
>> information on your Web site, but it does so for anybody. If you decide
>> against structured data on your site, you put a much greater barrier on
>> your potential target audience than on your competitors. The latter can
>> extract and analyze all your public Web data via crowdsourcing services
>> anyway.
>> 
>> Second, you also have legal means to protect your content against reuse.
>> If you have unique product description texts of a sufficient creative
>> value, you can sue anybody who extracts and republishes that content."
>> 
>> Martin
>> On Jul 29, 2013, at 6:39 PM, Dawson, Laura wrote:
>> 
>>> What I've been looking for is an interface that allows a "web monkey"
>>> or home user to do thisŠin book files. To mark up ebooks semantically,
>>> and have search engines ingest the files in their indexes, would be a
>>> huge leap forward. It would help search, it would help books, it would
>>> help society as a whole.
>>> 
>>> But we are missing three things in that: the Wordpress-y like interface
>>> that would allow this; the ability for an epub or mobi file to handle
>>> this markup without breaking; and the willingness of the book market to
>>> experiment. (To wit: Authors Guild lawsuit against Google Books
>>> regarding indexing and abstracting. Walled garden ebook environments.
>>> Etc.)
>>> 
>>> From: Wes Turner <wes.turner@gmail.com>
>>> Date: Monday, July 29, 2013 12:33 PM
>>> To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
>>> Cc: Dave Pawson <dave.pawson@gmail.com>, "public-vocabs@w3.org"
>>> <public-vocabs@w3.org>, Dan Brickley <danbri@google.com>
>>> Subject: Re: Ease of adoption
>>> Resent-From: <public-vocabs@w3.org>
>>> Resent-Date: Monday, July 29, 2013 12:34 PM
>>> 
>>> +1. http://en.m.wikipedia.org/wiki/Schema.org
>>> On Jul 29, 2013 10:46 AM, "Martin Hepp"
>>> <martin.hepp@ebusiness-unibw.org> wrote:
>>>> Here is my suggestion for a new intro:
>>>> 
>>>> "Many individuals and organizations use the Web to articulate their
>>>> messages: companies offer products, newspapers present news, bloggers
>>>> share opinions, etc.
>>>> Historically, the most relevant audience for a Web site were humans -
>>>> they found your Web site via a search engine and then consumed the
>>>> information from your site directly in their Web browsers.
>>>> 
>>>> Now, there are more and more digital devices between a Web site and
>>>> its target audience, and they cover a bigger share of the process of
>>>> using information from the Web. For instance, nowadays, the most
>>>> relevant results in a search engine are often not "main" pages, but
>>>> deep, detailed links into a Web site.
>>>> 
>>>> As a consequence, the decision for or against a product, restaurant,
>>>> newspaper, etc., -- in other words: your offer --, is made already in
>>>> the search results returned by the Web search engine. The better the
>>>> search engine understands the information inside your pages, the better
>>>> it can select, summarize, and present it to the target audiences.
>>>> 
>>>> Schema.org is a standard for marking-up the information in your Web
>>>> content in a way that search engines and other computer-based services
>>>> can understand. In database terminology, the structures used to
>>>> represent information as data are called a "schema". Schema.org defines
>>>> a common schema for the interface between your Web content and search
>>>> engines. It allows search engines and other services to better extract
>>>> and understand your site.
>>>> 
>>>> Why bother? Site owners spend a lot of effort for optimizing the user
>>>> experience of their site for human visitors, with stylesheets, icons,
>>>> font choices, etc. Schema.org is the next step: Optimizing the user
>>>> experience for your site when it is presented to your target audience
>>>> by a search engine, a mobile application, a browser extension, or any
>>>> new digital intermediary that may be in between."
>>>> 
>>>> Best
>>>> 
>>>> Martin Hepp
>>>> 
>>>> PS: I offer this text under Creative Commons CC BY 3.0 ;-)
>>>> 
>>>> On Jul 29, 2013, at 5:17 PM, Dave Pawson wrote:
>>>> 
>>>>> On 29 July 2013 15:23, Wes Turner <wes.turner@gmail.com> wrote:
>>>>>> 
>>>>>> On Jul 29, 2013 3:53 AM, "Dave Pawson" <dave.pawson@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>> Reading http://schema.org/docs/gs.html (IMHO) I don't see the
>>>> salesmans
>>>>>>> version,
>>>>>>> a trainers view of the ideas behind schema.org.
>>>>>>> 
>>>>>>> Has anyone started to think of how a web monkey or home user might
>>>> be
>>>>>>> persuaded
>>>>>>> to adopt microdata for their own usage?  E.g. taking the user
>>>> perspective?
>>>>>>> Dan and others may well find their way round schema.org, but it
>>>> isn't so
>>>>>>> easy
>>>>>>> to get started when a new user comes across it?
>>>>>> 
>>>>>> When you say "taking the user perspective", what exactly do you
>>>> mean by
>>>>>> that? How are you suggesting the pitch should be modified in order
>>>> to reach
>>>>>> the target audience?
>>>>> 
>>>>> IMHO that says it, succinctly and for a knowledgeable audience.
>>>>> If you look at intro type books (dummys ... etc), there is much more
>>>>> of a sell there. Persuasion as to why this tech is useful for them,
>>>>> meets an objective the reader may have?
>>>>> 
>>>>> E.g. "A collection of schemas"... WTF is a schema...?
>>>>> 
>>>>> " html tags, that webmasters can use to markup their pages in ways
>>>>> recognized by major search providers."
>>>>> Oh - that's not me then, I'm not a webmaster...
>>>>> 
>>>>> I.e just the slant?
>>>>> 
>>>>> Does that make sense?
>>>>> 
>>>>> regards DaveP
>>>>> 
>>>>> 
>>>>>> 
>>>>>> schema.org has a fairly great description:
>>>>>> 
>>>>>> """
>>>>>> What is Schema.org?
>>>>>> This site provides a collection of schemas, i.e., html tags, that
>>>> webmasters
>>>>>> can use to markup their pages in ways recognized by major search
>>>> providers.
>>>>>> Search engines including Bing, Google, Yahoo! and Yandex rely on
>>>> this markup
>>>>>> to improve the display of search results, making it easier for
>>>> people to
>>>>>> find the right web pages.
>>>>>> Many sites are generated from structured data, which is often
>>>> stored in
>>>>>> databases. When this data is formatted into HTML, it becomes very
>>>> difficult
>>>>>> to recover the original structured data. Many applications,
>>>> especially
>>>>>> search engines, can benefit greatly from direct access to this
>>>> structured
>>>>>> data. On-page markup enables search engines to understand the
>>>> information on
>>>>>> web pages and provide richer search results in order to make it
>>>> easier for
>>>>>> users to find relevant information on the web. Markup can also
>>>> enable new
>>>>>> tools and applications that make use of the structure.
>>>>>> A shared markup vocabulary makes it easier for webmasters to decide
>>>> on a
>>>>>> markup schema and get the maximum benefit for their efforts. So, in
>>>> the
>>>>>> spirit of sitemaps.org, search engines have come together to
>>>> provide a
>>>>>> shared collection of schemas that webmasters can use.
>>>>>> """
>>>>>> 
>>>>>> schema.org/docs/gs.html has the following heading structure:
>>>>>> 
>>>>>> Getting started with schema.org
>>>>>> * How to mark up your content using Microdata
>>>>>>  * Why use Microdata? [what about RDFa, these days]
>>>>>> * Using the schema.org vocabulary
>>>>>> * Advanced-topic: machine-understandable versions of information
>>>>>> 
>>>>>>> The other side of this is the breadth of options? How might the
>>>>>>> increasingly large
>>>>>>> number of terms be 'filtered' for use by  the man in the street to
>>>>>>> optimise his/her
>>>>>>> chances of a search engine result?
>>>>>>> 
>>>>>>> I think this aspect could and should be given consideration as the
>>>> size of
>>>>>>> the main term set increases.
>>>>>>> 
>>>>>>> Just a thought. Is there work being done in this area?
>>>>>> 
>>>>>> There is a fair amount of research regarding meta tag stuffing in
>>>> regards to
>>>>>> SEO.
>>>>>> 
>>>>>>> 
>>>>>>> regards
>>>>>>> 
>>>>>>> --
>>>>>>> Dave Pawson
>>>>>>> XSLT XSL-FO FAQ.
>>>>>>> Docbook FAQ.
>>>>>>> http://www.dpawson.co.uk
>>>>>>> 
>>>>>> 
>>>>>> IMHO, from an en-US perspective, the copy text for the schema.org
>>>> Ontology:
>>>>>> 
>>>>>> * is fairly verbose
>>>>>> * could have a few more bullet points
>>>>>> * could be updated to reference the supported formats
>>>>>> (RDF/XML, Turtle, JSON-LD, N3, NTriples, HTML5 Microdata, and
>>>> *RDFa*)
>>>>>> * could more directly allude to schema.rdfs.org and
>>>>>> http://schema.rdfs.org/tools.html
>>>>>> * could link to topical Wikipedia pages
>>>>>> 
>>>>>> Wikipedia pages
>>>>>> 
>>>>>> * /Linked_data
>>>>>> * /Semantic_web
>>>>>> * /Microdata_(HTML)
>>>>>> 
>>>>>> I collected a number of Wikipedia links that may be useful for, as
>>>> you put
>>>>>> it, teh "web monkey and home user" here:
>>>>>> 
>>>> http://www.reddit.com/r/semanticweb/comments/1dvakc/schemaorgdataset_sta
>>>> ndard_schema_for_linked_data/
>>>>>> 
>>>>>> Please feel free to share and incorporate this research.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Dave Pawson
>>>>> XSLT XSL-FO FAQ.
>>>>> Docbook FAQ.
>>>>> http://www.dpawson.co.uk
>>>>> 
>>>> 
>>>> --------------------------------------------------------
>>>> martin hepp
>>>> e-business & web science research group
>>>> universitaet der bundeswehr muenchen
>>>> 
>>>> e-mail:  hepp@ebusiness-unibw.org
>>>> phone:   +49-(0)89-6004-4217
>>>> fax:     +49-(0)89-6004-4620
>>>> www:     http://www.unibw.de/ebusiness/ (group)
>>>>         http://www.heppnetz.de/ (personal)
>>>> skype:   mfhepp
>>>> twitter: mfhepp
>>>> 
>>>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>>>> =================================================================
>>>> * Project Main Page: http://purl.org/goodrelations/
>>>> 
>>>> 
>>>> 
>> 
>> --------------------------------------------------------
>> martin hepp
>> e-business & web science research group
>> universitaet der bundeswehr muenchen
>> 
>> e-mail:  hepp@ebusiness-unibw.org
>> phone:   +49-(0)89-6004-4217
>> fax:     +49-(0)89-6004-4620
>> www:     http://www.unibw.de/ebusiness/ (group)
>>        http://www.heppnetz.de/ (personal)
>> skype:   mfhepp 
>> twitter: mfhepp
>> 
>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>> =================================================================
>> * Project Main Page: http://purl.org/goodrelations/
>> 
>> 
>> 
>> 
> 
> 

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/
Received on Monday, 29 July 2013 17:26:32 UTC