- From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
- Date: Mon, 29 Jul 2013 19:26:06 +0200
- To: "Dawson, Laura" <Laura.Dawson@bowker.com>
- Cc: Wes Turner <wes.turner@gmail.com>, Dave Pawson <dave.pawson@gmail.com>, "public-vocabs@w3.org" <public-vocabs@w3.org>, Dan Brickley <danbri@google.com>
Thanks, you are very welcome - yes, I understand, books are different - but the basic pattern is the same: You can never win by making access to (selected parts of) your content less machine-friendly. That is like not putting price-tags on products to escape from price-comparison. It may work for a short while in selected segments, but it won't allow survival of an otherwise inferior business model or deficient operations. (Note: I am not saying that producing books is per se a bad business model ;-))
Martin
PS: Side-story: I once read in the disclaimers of a small German winery's Web site that "external linking to this site without written consent was forbidden" ;-)
On Jul 29, 2013, at 7:14 PM, Dawson, Laura wrote:
> This is excellent - of course, book publishers just don't think this way.
> Thank you for this!!!
>
> On 7/29/13 1:12 PM, "Martin Hepp" <martin.hepp@ebusiness-unibw.org> wrote:
>
>> Hi Dawson:
>>
>> I also have a common reply to the concern raised by site-owners that rich
>> data markup makes it easier for your competitors to abuse your content.
>>
>> "But schema.org will make it easy for my competitors to harvest my
>> prices, product descriptions, or dealer network information!"
>>
>> First, most site-owners do not realize how easy it is as of today for
>> anybody to extract content from others' Web sites via crowdsourcing:
>> Assumed your competitor wants an Excel table with all your dealers, their
>> addresses, and opening hours. With services like Amazon Mechanical Turk
>> or CrowdFlower, it will be a job of 15 Minutes and 50 USD or less to hire
>> human labor to extract that information for you, including reformatting,
>> spell-check, etc. Even a small competitor of yours can take that effort
>> if seriously interested. And it is actually more expensive to operate a
>> decent Web crawler for structured data than that. However, your
>> prospective clients will likely neither spend the time nor money to
>> access your data that way.
>>
>> It it true that structured data simplifies the access to and use of the
>> information on your Web site, but it does so for anybody. If you decide
>> against structured data on your site, you put a much greater barrier on
>> your potential target audience than on your competitors. The latter can
>> extract and analyze all your public Web data via crowdsourcing services
>> anyway.
>>
>> Second, you also have legal means to protect your content against reuse.
>> If you have unique product description texts of a sufficient creative
>> value, you can sue anybody who extracts and republishes that content."
>>
>> Martin
>> On Jul 29, 2013, at 6:39 PM, Dawson, Laura wrote:
>>
>>> What I've been looking for is an interface that allows a "web monkey"
>>> or home user to do thisŠin book files. To mark up ebooks semantically,
>>> and have search engines ingest the files in their indexes, would be a
>>> huge leap forward. It would help search, it would help books, it would
>>> help society as a whole.
>>>
>>> But we are missing three things in that: the Wordpress-y like interface
>>> that would allow this; the ability for an epub or mobi file to handle
>>> this markup without breaking; and the willingness of the book market to
>>> experiment. (To wit: Authors Guild lawsuit against Google Books
>>> regarding indexing and abstracting. Walled garden ebook environments.
>>> Etc.)
>>>
>>> From: Wes Turner <wes.turner@gmail.com>
>>> Date: Monday, July 29, 2013 12:33 PM
>>> To: Martin Hepp <martin.hepp@ebusiness-unibw.org>
>>> Cc: Dave Pawson <dave.pawson@gmail.com>, "public-vocabs@w3.org"
>>> <public-vocabs@w3.org>, Dan Brickley <danbri@google.com>
>>> Subject: Re: Ease of adoption
>>> Resent-From: <public-vocabs@w3.org>
>>> Resent-Date: Monday, July 29, 2013 12:34 PM
>>>
>>> +1. http://en.m.wikipedia.org/wiki/Schema.org
>>> On Jul 29, 2013 10:46 AM, "Martin Hepp"
>>> <martin.hepp@ebusiness-unibw.org> wrote:
>>>> Here is my suggestion for a new intro:
>>>>
>>>> "Many individuals and organizations use the Web to articulate their
>>>> messages: companies offer products, newspapers present news, bloggers
>>>> share opinions, etc.
>>>> Historically, the most relevant audience for a Web site were humans -
>>>> they found your Web site via a search engine and then consumed the
>>>> information from your site directly in their Web browsers.
>>>>
>>>> Now, there are more and more digital devices between a Web site and
>>>> its target audience, and they cover a bigger share of the process of
>>>> using information from the Web. For instance, nowadays, the most
>>>> relevant results in a search engine are often not "main" pages, but
>>>> deep, detailed links into a Web site.
>>>>
>>>> As a consequence, the decision for or against a product, restaurant,
>>>> newspaper, etc., -- in other words: your offer --, is made already in
>>>> the search results returned by the Web search engine. The better the
>>>> search engine understands the information inside your pages, the better
>>>> it can select, summarize, and present it to the target audiences.
>>>>
>>>> Schema.org is a standard for marking-up the information in your Web
>>>> content in a way that search engines and other computer-based services
>>>> can understand. In database terminology, the structures used to
>>>> represent information as data are called a "schema". Schema.org defines
>>>> a common schema for the interface between your Web content and search
>>>> engines. It allows search engines and other services to better extract
>>>> and understand your site.
>>>>
>>>> Why bother? Site owners spend a lot of effort for optimizing the user
>>>> experience of their site for human visitors, with stylesheets, icons,
>>>> font choices, etc. Schema.org is the next step: Optimizing the user
>>>> experience for your site when it is presented to your target audience
>>>> by a search engine, a mobile application, a browser extension, or any
>>>> new digital intermediary that may be in between."
>>>>
>>>> Best
>>>>
>>>> Martin Hepp
>>>>
>>>> PS: I offer this text under Creative Commons CC BY 3.0 ;-)
>>>>
>>>> On Jul 29, 2013, at 5:17 PM, Dave Pawson wrote:
>>>>
>>>>> On 29 July 2013 15:23, Wes Turner <wes.turner@gmail.com> wrote:
>>>>>>
>>>>>> On Jul 29, 2013 3:53 AM, "Dave Pawson" <dave.pawson@gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>> Reading http://schema.org/docs/gs.html (IMHO) I don't see the
>>>> salesmans
>>>>>>> version,
>>>>>>> a trainers view of the ideas behind schema.org.
>>>>>>>
>>>>>>> Has anyone started to think of how a web monkey or home user might
>>>> be
>>>>>>> persuaded
>>>>>>> to adopt microdata for their own usage? E.g. taking the user
>>>> perspective?
>>>>>>> Dan and others may well find their way round schema.org, but it
>>>> isn't so
>>>>>>> easy
>>>>>>> to get started when a new user comes across it?
>>>>>>
>>>>>> When you say "taking the user perspective", what exactly do you
>>>> mean by
>>>>>> that? How are you suggesting the pitch should be modified in order
>>>> to reach
>>>>>> the target audience?
>>>>>
>>>>> IMHO that says it, succinctly and for a knowledgeable audience.
>>>>> If you look at intro type books (dummys ... etc), there is much more
>>>>> of a sell there. Persuasion as to why this tech is useful for them,
>>>>> meets an objective the reader may have?
>>>>>
>>>>> E.g. "A collection of schemas"... WTF is a schema...?
>>>>>
>>>>> " html tags, that webmasters can use to markup their pages in ways
>>>>> recognized by major search providers."
>>>>> Oh - that's not me then, I'm not a webmaster...
>>>>>
>>>>> I.e just the slant?
>>>>>
>>>>> Does that make sense?
>>>>>
>>>>> regards DaveP
>>>>>
>>>>>
>>>>>>
>>>>>> schema.org has a fairly great description:
>>>>>>
>>>>>> """
>>>>>> What is Schema.org?
>>>>>> This site provides a collection of schemas, i.e., html tags, that
>>>> webmasters
>>>>>> can use to markup their pages in ways recognized by major search
>>>> providers.
>>>>>> Search engines including Bing, Google, Yahoo! and Yandex rely on
>>>> this markup
>>>>>> to improve the display of search results, making it easier for
>>>> people to
>>>>>> find the right web pages.
>>>>>> Many sites are generated from structured data, which is often
>>>> stored in
>>>>>> databases. When this data is formatted into HTML, it becomes very
>>>> difficult
>>>>>> to recover the original structured data. Many applications,
>>>> especially
>>>>>> search engines, can benefit greatly from direct access to this
>>>> structured
>>>>>> data. On-page markup enables search engines to understand the
>>>> information on
>>>>>> web pages and provide richer search results in order to make it
>>>> easier for
>>>>>> users to find relevant information on the web. Markup can also
>>>> enable new
>>>>>> tools and applications that make use of the structure.
>>>>>> A shared markup vocabulary makes it easier for webmasters to decide
>>>> on a
>>>>>> markup schema and get the maximum benefit for their efforts. So, in
>>>> the
>>>>>> spirit of sitemaps.org, search engines have come together to
>>>> provide a
>>>>>> shared collection of schemas that webmasters can use.
>>>>>> """
>>>>>>
>>>>>> schema.org/docs/gs.html has the following heading structure:
>>>>>>
>>>>>> Getting started with schema.org
>>>>>> * How to mark up your content using Microdata
>>>>>> * Why use Microdata? [what about RDFa, these days]
>>>>>> * Using the schema.org vocabulary
>>>>>> * Advanced-topic: machine-understandable versions of information
>>>>>>
>>>>>>> The other side of this is the breadth of options? How might the
>>>>>>> increasingly large
>>>>>>> number of terms be 'filtered' for use by the man in the street to
>>>>>>> optimise his/her
>>>>>>> chances of a search engine result?
>>>>>>>
>>>>>>> I think this aspect could and should be given consideration as the
>>>> size of
>>>>>>> the main term set increases.
>>>>>>>
>>>>>>> Just a thought. Is there work being done in this area?
>>>>>>
>>>>>> There is a fair amount of research regarding meta tag stuffing in
>>>> regards to
>>>>>> SEO.
>>>>>>
>>>>>>>
>>>>>>> regards
>>>>>>>
>>>>>>> --
>>>>>>> Dave Pawson
>>>>>>> XSLT XSL-FO FAQ.
>>>>>>> Docbook FAQ.
>>>>>>> http://www.dpawson.co.uk
>>>>>>>
>>>>>>
>>>>>> IMHO, from an en-US perspective, the copy text for the schema.org
>>>> Ontology:
>>>>>>
>>>>>> * is fairly verbose
>>>>>> * could have a few more bullet points
>>>>>> * could be updated to reference the supported formats
>>>>>> (RDF/XML, Turtle, JSON-LD, N3, NTriples, HTML5 Microdata, and
>>>> *RDFa*)
>>>>>> * could more directly allude to schema.rdfs.org and
>>>>>> http://schema.rdfs.org/tools.html
>>>>>> * could link to topical Wikipedia pages
>>>>>>
>>>>>> Wikipedia pages
>>>>>>
>>>>>> * /Linked_data
>>>>>> * /Semantic_web
>>>>>> * /Microdata_(HTML)
>>>>>>
>>>>>> I collected a number of Wikipedia links that may be useful for, as
>>>> you put
>>>>>> it, teh "web monkey and home user" here:
>>>>>>
>>>> http://www.reddit.com/r/semanticweb/comments/1dvakc/schemaorgdataset_sta
>>>> ndard_schema_for_linked_data/
>>>>>>
>>>>>> Please feel free to share and incorporate this research.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dave Pawson
>>>>> XSLT XSL-FO FAQ.
>>>>> Docbook FAQ.
>>>>> http://www.dpawson.co.uk
>>>>>
>>>>
>>>> --------------------------------------------------------
>>>> martin hepp
>>>> e-business & web science research group
>>>> universitaet der bundeswehr muenchen
>>>>
>>>> e-mail: hepp@ebusiness-unibw.org
>>>> phone: +49-(0)89-6004-4217
>>>> fax: +49-(0)89-6004-4620
>>>> www: http://www.unibw.de/ebusiness/ (group)
>>>> http://www.heppnetz.de/ (personal)
>>>> skype: mfhepp
>>>> twitter: mfhepp
>>>>
>>>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>>>> =================================================================
>>>> * Project Main Page: http://purl.org/goodrelations/
>>>>
>>>>
>>>>
>>
>> --------------------------------------------------------
>> martin hepp
>> e-business & web science research group
>> universitaet der bundeswehr muenchen
>>
>> e-mail: hepp@ebusiness-unibw.org
>> phone: +49-(0)89-6004-4217
>> fax: +49-(0)89-6004-4620
>> www: http://www.unibw.de/ebusiness/ (group)
>> http://www.heppnetz.de/ (personal)
>> skype: mfhepp
>> twitter: mfhepp
>>
>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>> =================================================================
>> * Project Main Page: http://purl.org/goodrelations/
>>
>>
>>
>>
>
>
--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen
e-mail: hepp@ebusiness-unibw.org
phone: +49-(0)89-6004-4217
fax: +49-(0)89-6004-4620
www: http://www.unibw.de/ebusiness/ (group)
http://www.heppnetz.de/ (personal)
skype: mfhepp
twitter: mfhepp
Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/
Received on Monday, 29 July 2013 17:26:32 UTC