Re: proposal by Encyclopaedia Britannica from Martin Hepp on 2013-11-18 (public-vocabs@w3.org from November 2013)

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Mon, 18 Nov 2013 17:37:58 +0100
To: Guha <guha@google.com>
Cc: "Hetrea, Carmen" <CHetrea@eb.com>, list <public-vocabs@w3.org>, "Cranmer, Paul" <PCranmer@eb.com>
Message-Id: <AC4D7FEB-B2F2-4E6F-9AB8-DAF071537FC8@ebusiness-unibw.org>

Carmen:

See below for a very quick feedback on part of your proposal:

On Nov 18, 2013, at 3:50 PM, Guha wrote:

> On Tue, Nov 12, 2013 at 9:45 AM, Hetrea, Carmen <CHetrea@eb.com> wrote:
...
> SCHEMA.ORG ONTOLOGY EXPANSION - a PROPOSAL by Encyclopaedia Britannica
> 
>  
...
> 
> Proposal:
> 
> 1.    We propose top Class changes
> Top Class: SchemaOrgClass
> Two major Subclass divisions of information: Concept and Entity.


I agree that this makes sense from a knowledge representation perspective, but I have some concerns that this distinction actually improves the vocabulary for typical Webmasters.
In general, philosophically-grounded top-level distinctions can be difficult to apply by practitioners, which means that in the end, the quality of the data deteriorates.
For a *Web* vocabulary, ambiguous classes need not be a disadvantage since they can be reliably applied by publishers of data who are unable to apply the conceptual distinction reliably.


> 2.    We propose to allow multiple class designation for a given Webpage
> This will allow content providers to classify both the subject matter and the delivery format individually as well as accommodate the fact that many general knowledge subjects can belong to more than one class. 
>  
>         For example, if a content provider offers an Article on an Event, both classes should be admissible and recognized by the semantic engine that reads the markup.  In the case of a Video showing Angkor Wat, a temple complex in Angkor, Cambodia, the subject is both a Place and a ManMadeObject (in this case a Temple) and the delivery format is Video.
> 
At the level of the vocabulary, it is already possible to expose information about multiple entities, so I may not understand your proposal correctly.
If you are referring to the problem that getting Rich Snippets is difficult if you mark-up multiple types of entities, then this is a different issue.
While this forum is not about the actual usage of schema.org by Google or any other search engine, and while I clearly do not claim to speak on behalf of any single search engine, the problem is that Rich Snippets or any other technique for summarizing page content for previews in the organic search results have to condense a whole page *to a single snippet*.
The currently dominating approach is to implement snippet types organized around a single, dominating type of object - e.g. products or events.
You may have seen that e.g. Google is sometimes already showing snippet types for pages that contain multiple objects, like the one attached (this one is not based on schema.org markup but other data sources, though),

but this seems to be experimental at this point.

I general I assume that search engines base such implementation details on usability studies and that type-centric snippets seem to work better at this point.

This is why you currently have to help the search engine understand which one is the most important entity in the mark-up of a page, and again, this is not a conceptual issue at the level of schema.org.

>         We propose that the Properties associated with all Classes applied to the Webpage be valid for markup.
I suggest to use multiple types in such cases instead. There is no need to broaden the domain of all specific properties to http://schema.org/Thing, IMO.
> 
> 3.   We propose Definitional Class Extensions rather than continually articulating more specific Classes.
> 
>         a.    This is to avoid the need to continually update the Ontology as content providers offer information on an increasing variety of subjects.
> 
>         b.    The Definitional Class Extension is a word or phrase added to the Class that the content provider chooses to introduce as the specific language that further defines the entity in question.  For example, the Definitional Class Extension for Angkor Wat could be Complex, so the Class would become Temple Complex (the technical means of adding the extension without affecting the readability of the Ontology class must be decided).  The extension could even be augmented to include Bhuddism.  Wherever possible the extension language could refer to URIs from a source such as the DBpedia or even the Encyclopaedia Britannica Semantic SPARQL Endpoint to enable further machine disambiguation.
> 
>         c.    Definitional Class Extensions are to be considered part of the Class for semantic disambiguation, both visual as in a Rich Snippet and for search.
> 
>         d.    The word or phrase introduced as Definitional Class Extension does not depend upon visible text for markup, but is introduced as an addition to the Class.
> 
>         e.    The available Properties remain those of the Ontology Class irrespective of the Definitional Class Extension.
> 

It may be useful to provide a better, generic extension mechanism like this as an addition to the pre-defined types in use.
But in delegating the definition and articulation of all type information to the site owners makes processing the data much more difficult to data consumers, since it puts us back into the NLP-centric status of information extraction. So I am not supporting this approach as a general modeling principle for schema.org.

Please also note that we had a related discussion on introducing a conceptual element for "Topic", "Concept", or "Category".

The rest of the proposal I cannot comment on in detail for now, but I hope you consider this feedback useful nonetheless.

Best wishes
Martin


--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/

Attachments

image/png attachment: PastedGraphic-1.png

Received on Monday, 18 November 2013 16:38:25 UTC