RE: proposal by Encyclopaedia Britannica

Martin,

Thanks for your feedback.  Please see my comments below in brackets.

-----Original Message-----
From: Martin Hepp [mailto:martin.hepp@ebusiness-unibw.org] 
Sent: Monday, November 18, 2013 10:38 AM
To: Guha
Cc: Hetrea, Carmen; list; Cranmer, Paul
Subject: Re: proposal by Encyclopaedia Britannica

Carmen:

See below for a very quick feedback on part of your proposal:

On Nov 18, 2013, at 3:50 PM, Guha wrote:

> On Tue, Nov 12, 2013 at 9:45 AM, Hetrea, Carmen <CHetrea@eb.com> wrote:
....
> SCHEMA.ORG ONTOLOGY EXPANSION - a PROPOSAL by Encyclopaedia Britannica
> 
>  
....
> 
> Proposal:
> 
> 1.    We propose top Class changes
> Top Class: SchemaOrgClass
> Two major Subclass divisions of information: Concept and Entity.


I agree that this makes sense from a knowledge representation perspective, but I have some concerns that this distinction actually improves the vocabulary for typical Webmasters.
In general, philosophically-grounded top-level distinctions can be difficult to apply by practitioners, which means that in the end, the quality of the data deteriorates.
For a *Web* vocabulary, ambiguous classes need not be a disadvantage since they can be reliably applied by publishers of data who are unable to apply the conceptual distinction reliably.

[First of all, the schema.org ontology is conceived to organize entities and has no place for organizing concepts. Since our primary concern is a consistent and reliable machine-readable semantic representation of knowledge for the Semantic Web, we arrived at our proposal primarily because we did not see any semantic ontologies that were concerned with representing general-knowledge content. 

The two top classes are essential for this purpose, because of the fundamental differences between concepts, which metaphysical, and entities which are physical.  Concepts can be related in semantic hierarchies that provide a useful skeleton for organizing information in the traditional vertically related broader concept, narrower concept, and the horizontally related concepts as has been addressed in SKOS.  We have added to this scenario also component relationships, i.e. concepts that effectively complete a picture thought not semantically narrower concepts.  

Entities, however, though definable by concepts, relate to each other multi-dimensionally both semantically and syntactically.  This distinction is simple and basic and can enable practitioners to manage their information more effectively.  While I understand your point that ambiguity appears to allow for a more flexible vocabulary, it risks undermining the very premise of semantics.  We have learned that ambiguous vocabulary ultimately limits the usefulness of the data it defines because it is inaccurate.  The more precision you are able to achieve in your system, the more flexibility your system will acquire in using the data it manages, since you can count on getting accurate results.]



> 2.    We propose to allow multiple class designation for a given Webpage
> This will allow content providers to classify both the subject matter and the delivery format individually as well as accommodate the fact that many general knowledge subjects can belong to more than one class. 
>  
>         For example, if a content provider offers an Article on an Event, both classes should be admissible and recognized by the semantic engine that reads the markup.  In the case of a Video showing Angkor Wat, a temple complex in Angkor, Cambodia, the subject is both a Place and a ManMadeObject (in this case a Temple) and the delivery format is Video.
> 
At the level of the vocabulary, it is already possible to expose information about multiple entities, so I may not understand your proposal correctly.
If you are referring to the problem that getting Rich Snippets is difficult if you mark-up multiple types of entities, then this is a different issue.
While this forum is not about the actual usage of schema.org by Google or any other search engine, and while I clearly do not claim to speak on behalf of any single search engine, the problem is that Rich Snippets or any other technique for summarizing page content for previews in the organic search results have to condense a whole page *to a single snippet*.
The currently dominating approach is to implement snippet types organized around a single, dominating type of object - e.g. products or events.
You may have seen that e.g. Google is sometimes already showing snippet types for pages that contain multiple objects, like the one attached (this one is not based on schema.org markup but other data sources, though), 

[The proposal was initially conceived as an expansion of schema.org and its application by Google, Bing, and Yahoo.  Since it became clear that the search engines would only accept one class for a given resource, we felt it is important to bring up the fact that this does not meet the need since many things by their very nature belong to more than one class.  We believe that these multiple class designations should be recognized by the engines and incorporated into their semantic representation of the Webpage content.]

Sincerely,

Paul

Received on Tuesday, 19 November 2013 08:00:51 UTC