Re: proposal by Encyclopaedia Britannica from Martin Hepp on 2013-11-18 (public-vocabs@w3.org from November 2013)

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Mon, 18 Nov 2013 18:53:42 +0100
To: "Cranmer, Paul" <PCranmer@eb.com>
Cc: Guha <guha@google.com>, "Hetrea, Carmen" <CHetrea@eb.com>, list <public-vocabs@w3.org>
Message-Id: <7C105C98-B484-485F-9BD2-813C52714ADE@ebusiness-unibw.org>
Paul:
Thanks for your reply. However, I think that there are a few misunderstandings.

First, schema.org is a Web vocabulary, not an ontology in the academic sense of the term. The nature of shared data structures at Web scale is only partly understood as of today, but we already know that it is not as simple a challenge as to build a "reliable machine-readable semantic representation of knowledge for the Semantic Web" as you state.

The fact that you need to use the term "metaphysical" to explain your proposal already indicates that there is a misfit between the audience you have in mind for your proposal and the audience who will actually use schema.org.

By the way, I disagree with the notion that entities are necessarily physical. Entities in the context of database systems can also be abstract things, at least since the initial work by Peter Chen on Entity-Relationship Modeling in 1976 [1].

> The more precision you are able to achieve in your system, the more flexibility your system will acquire in using the data it manages, since you can count on getting accurate results.]

As we all know, the Web is a complex socio-technical ecosystem. Precision of the vocabulary alone does not necessarily have any positive impact on the quality of the data. If people do not understand the specification with ease and reliably, they may avoid certain conceptual elements (-> less data) or use the elements less reliably (-> lower data quality). This issue has been discussed in this forum at length recently. For more information on my take at this, see http://vimeo.com/51152934.


> [The proposal was initially conceived as an expansion of schema.org and its application by Google, Bing, and Yahoo.  Since it became clear that the search engines would only accept one class for a given resource, we felt it is important to bring up the fact that this does not meet the need since many things by their very nature belong to more than one class.  We believe that these multiple class designations should be recognized by the engines and incorporated into their semantic representation of the Webpage content.]
Without wanting to offend you: When you suggest more subtle conceptual distinction, then it would be nice for you to separate the two proposals you have at the conceptual level into

1. A proposal to extend the conceptual model of schema.org and
2. A proposal to search engines to extend the consumption of Web data based on schema.org.

Since the schema.org sponsors do in general not discuss the usage of schema.org data in this forum (but rather in their individual, products-related forums), mainly the first issues is relevant in here.

Best

Martin


[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.1085

On Nov 18, 2013, at 6:32 PM, Cranmer, Paul wrote:

> Martin,
> 
> Thanks for your feedback.  Please see my comments below in brackets.
> 
> -----Original Message-----
> From: Martin Hepp [mailto:martin.hepp@ebusiness-unibw.org] 
> Sent: Monday, November 18, 2013 10:38 AM
> To: Guha
> Cc: Hetrea, Carmen; list; Cranmer, Paul
> Subject: Re: proposal by Encyclopaedia Britannica
> 
> Carmen:
> 
> See below for a very quick feedback on part of your proposal:
> 
> On Nov 18, 2013, at 3:50 PM, Guha wrote:
> 
>> On Tue, Nov 12, 2013 at 9:45 AM, Hetrea, Carmen <CHetrea@eb.com> wrote:
> ...
>> SCHEMA.ORG ONTOLOGY EXPANSION - a PROPOSAL by Encyclopaedia Britannica
>> 
>> 
> ...
>> 
>> Proposal:
>> 
>> 1.    We propose top Class changes
>> Top Class: SchemaOrgClass
>> Two major Subclass divisions of information: Concept and Entity.
> 
> 
> I agree that this makes sense from a knowledge representation perspective, but I have some concerns that this distinction actually improves the vocabulary for typical Webmasters.
> In general, philosophically-grounded top-level distinctions can be difficult to apply by practitioners, which means that in the end, the quality of the data deteriorates.
> For a *Web* vocabulary, ambiguous classes need not be a disadvantage since they can be reliably applied by publishers of data who are unable to apply the conceptual distinction reliably.
> 
> [First of all, the schema.org ontology is conceived to organize entities and has no place for organizing concepts. Since our primary concern is a consistent and reliable machine-readable semantic representation of knowledge for the Semantic Web, we arrived at our proposal primarily because we did not see any semantic ontologies that were concerned with representing general-knowledge content. 
> 
> The two top classes are essential for this purpose, because of the fundamental differences between concepts, which metaphysical, and entities which are physical.  Concepts can be related in semantic hierarchies that provide a useful skeleton for organizing information in the traditional vertically related broader concept, narrower concept, and the horizontally related concepts as has been addressed in SKOS.  We have added to this scenario also component relationships, i.e. concepts that effectively complete a picture thought not semantically narrower concepts.  
> 
> Entities, however, though definable by concepts, relate to each other multi-dimensionally both semantically and syntactically.  This distinction is simple and basic and can enable practitioners to manage their information more effectively.  While I understand your point that ambiguity appears to allow for a more flexible vocabulary, it risks undermining the very premise of semantics.  We have learned that ambiguous vocabulary ultimately limits the usefulness of the data it defines because it is inaccurate.  The more precision you are able to achieve in your system, the more flexibility your system will acquire in using the data it manages, since you can count on getting accurate results.]
> 
> 
> 
>> 2.    We propose to allow multiple class designation for a given Webpage
>> This will allow content providers to classify both the subject matter and the delivery format individually as well as accommodate the fact that many general knowledge subjects can belong to more than one class. 
>> 
>>        For example, if a content provider offers an Article on an Event, both classes should be admissible and recognized by the semantic engine that reads the markup.  In the case of a Video showing Angkor Wat, a temple complex in Angkor, Cambodia, the subject is both a Place and a ManMadeObject (in this case a Temple) and the delivery format is Video.
>> 
> At the level of the vocabulary, it is already possible to expose information about multiple entities, so I may not understand your proposal correctly.
> If you are referring to the problem that getting Rich Snippets is difficult if you mark-up multiple types of entities, then this is a different issue.
> While this forum is not about the actual usage of schema.org by Google or any other search engine, and while I clearly do not claim to speak on behalf of any single search engine, the problem is that Rich Snippets or any other technique for summarizing page content for previews in the organic search results have to condense a whole page *to a single snippet*.
> The currently dominating approach is to implement snippet types organized around a single, dominating type of object - e.g. products or events.
> You may have seen that e.g. Google is sometimes already showing snippet types for pages that contain multiple objects, like the one attached (this one is not based on schema.org markup but other data sources, though), 
> 
> [The proposal was initially conceived as an expansion of schema.org and its application by Google, Bing, and Yahoo.  Since it became clear that the search engines would only accept one class for a given resource, we felt it is important to bring up the fact that this does not meet the need since many things by their very nature belong to more than one class.  We believe that these multiple class designations should be recognized by the engines and incorporated into their semantic representation of the Webpage content.]
> 
> Sincerely,
> 
> Paul
> 

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/
Received on Monday, 18 November 2013 17:54:08 UTC