RE: proposal by Encyclopaedia Britannica

Hi Martin,

I understand your points.  And I accept that your analysis of the audience for schema.org is not quite the same as the one I am addressing.  I am involved in information management from an information science viewpoint, and the things that I have learned over the years are reflected in the proposal that we have made.  

Regarding the use of the term metaphysical in contrast to physical, it was used for convenience to emphasize the distinction between what is a concept and what is an entity.  The fact that entities may not always be physical in the sense you note, is, of course, true.  So let me rephrase it to define an entity as something that exists beyond the mere conceptual.  In essence entities are the manifestations of concepts, though the concept may not have preceded the entity.  While I think my point was clear, I recognize the inadequacy of my phrasing.

I fully recognize that building a reliable machine-readable semantic representation of knowledge is not a simple challenge.  What is important to me is that we address it as the common denominator that makes the parts fit together whenever we develop semantic strategies. 

It appears that most of the interest has not been focused on general knowledge, as its application does not address directly or immediately the needs of Web sites to promote their content.  This is understandable.  However, the discussion should not be so focused on the parts that it does not also address them in the context of the whole.  Otherwise choices are made that will eventually need to be undone because they do not work when they must become part of the larger picture.  If we always work from the perspective of the whole, the choices we make concerning the parts become coherent and reusable as the picture broadens.

So, though I recognize the validity of what you say as it applies to immediate needs, I cannot refrain from hoping and proposing that we first create our overview of knowledge before we start creating structures to manage some of its parts.  I am not saying that no one is doing this, just that projects like schema.org reveal that not everyone in significant positions appears to be doing it.

I have no objection to separating the proposal such that one proposal addresses extensions to schema.org and separately a proposal for how search engines can better use that information.  As you understood, the proposal was initially one to extend schema.org by creating a coexistent structure for general knowledge as opposed to focusing only on business and current happenings.  However, we came to propose our thoughts to others as we looked around and came to believe that what we have learned may be of value to community at large.  We do not pretend to have the technical experience that many of those we are addressing have.  But we do have decades of experience with what it takes to manage information from a semantic perspective, and we sincerely wish to share that expertise with the Web community that we are now part of.  So please take what we have to say as something that may contain useful observations rather than a finished work that should be accepted.

Thanks again for taking the time to reply to me.

Paul


-----Original Message-----
From: Martin Hepp [mailto:martin.hepp@ebusiness-unibw.org] 
Sent: Monday, November 18, 2013 11:54 AM
To: Cranmer, Paul
Cc: Guha; Hetrea, Carmen; list
Subject: Re: proposal by Encyclopaedia Britannica

Paul:
Thanks for your reply. However, I think that there are a few misunderstandings.

First, schema.org is a Web vocabulary, not an ontology in the academic sense of the term. The nature of shared data structures at Web scale is only partly understood as of today, but we already know that it is not as simple a challenge as to build a "reliable machine-readable semantic representation of knowledge for the Semantic Web" as you state.

The fact that you need to use the term "metaphysical" to explain your proposal already indicates that there is a misfit between the audience you have in mind for your proposal and the audience who will actually use schema.org..

By the way, I disagree with the notion that entities are necessarily physical. Entities in the context of database systems can also be abstract things, at least since the initial work by Peter Chen on Entity-Relationship Modeling in 1976 [1].

> The more precision you are able to achieve in your system, the more 
> flexibility your system will acquire in using the data it manages, 
> since you can count on getting accurate results.]

As we all know, the Web is a complex socio-technical ecosystem. Precision of the vocabulary alone does not necessarily have any positive impact on the quality of the data. If people do not understand the specification with ease and reliably, they may avoid certain conceptual elements (-> less data) or use the elements less reliably (-> lower data quality). This issue has been discussed in this forum at length recently. For more information on my take at this, see http://vimeo.com/51152934.


> [The proposal was initially conceived as an expansion of schema.org 
> and its application by Google, Bing, and Yahoo.  Since it became clear 
> that the search engines would only accept one class for a given 
> resource, we felt it is important to bring up the fact that this does 
> not meet the need since many things by their very nature belong to 
> more than one class.  We believe that these multiple class 
> designations should be recognized by the engines and incorporated into 
> their semantic representation of the Webpage content.]
Without wanting to offend you: When you suggest more subtle conceptual distinction, then it would be nice for you to separate the two proposals you have at the conceptual level into

1. A proposal to extend the conceptual model of schema.org and 2. A proposal to search engines to extend the consumption of Web data based on schema.org.

Since the schema.org sponsors do in general not discuss the usage of schema..org data in this forum (but rather in their individual, products-related forums), mainly the first issues is relevant in here.

Best

Martin


[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.1085

On Nov 18, 2013, at 6:32 PM, Cranmer, Paul wrote:

> Martin,
> 
> Thanks for your feedback.  Please see my comments below in brackets.
> 
> -----Original Message-----
> From: Martin Hepp [mailto:martin.hepp@ebusiness-unibw.org]
> Sent: Monday, November 18, 2013 10:38 AM
> To: Guha
> Cc: Hetrea, Carmen; list; Cranmer, Paul
> Subject: Re: proposal by Encyclopaedia Britannica
> 
> Carmen:
> 
> See below for a very quick feedback on part of your proposal:
> 
> On Nov 18, 2013, at 3:50 PM, Guha wrote:
> 
>> On Tue, Nov 12, 2013 at 9:45 AM, Hetrea, Carmen <CHetrea@eb.com> wrote:
> ...
>> SCHEMA.ORG ONTOLOGY EXPANSION - a PROPOSAL by Encyclopaedia 
>> Britannica
>> 
>> 
> ...
>> 
>> Proposal:
>> 
>> 1.    We propose top Class changes
>> Top Class: SchemaOrgClass
>> Two major Subclass divisions of information: Concept and Entity.
> 
> 
> I agree that this makes sense from a knowledge representation perspective, but I have some concerns that this distinction actually improves the vocabulary for typical Webmasters.
> In general, philosophically-grounded top-level distinctions can be difficult to apply by practitioners, which means that in the end, the quality of the data deteriorates.
> For a *Web* vocabulary, ambiguous classes need not be a disadvantage since they can be reliably applied by publishers of data who are unable to apply the conceptual distinction reliably.
> 
> [First of all, the schema.org ontology is conceived to organize entities and has no place for organizing concepts. Since our primary concern is a consistent and reliable machine-readable semantic representation of knowledge for the Semantic Web, we arrived at our proposal primarily because we did not see any semantic ontologies that were concerned with representing general-knowledge content. 
> 
> The two top classes are essential for this purpose, because of the fundamental differences between concepts, which metaphysical, and entities which are physical.  Concepts can be related in semantic hierarchies that provide a useful skeleton for organizing information in the traditional vertically related broader concept, narrower concept, and the horizontally related concepts as has been addressed in SKOS.  We have added to this scenario also component relationships, i.e. concepts that effectively complete a picture thought not semantically narrower concepts.  
> 
> Entities, however, though definable by concepts, relate to each other 
> multi-dimensionally both semantically and syntactically.  This 
> distinction is simple and basic and can enable practitioners to manage 
> their information more effectively.  While I understand your point 
> that ambiguity appears to allow for a more flexible vocabulary, it 
> risks undermining the very premise of semantics.  We have learned that 
> ambiguous vocabulary ultimately limits the usefulness of the data it 
> defines because it is inaccurate.  The more precision you are able to 
> achieve in your system, the more flexibility your system will acquire 
> in using the data it manages, since you can count on getting accurate 
> results.]
> 
> 
> 
>> 2.    We propose to allow multiple class designation for a given Webpage
>> This will allow content providers to classify both the subject matter and the delivery format individually as well as accommodate the fact that many general knowledge subjects can belong to more than one class. 
>> 
>>        For example, if a content provider offers an Article on an Event, both classes should be admissible and recognized by the semantic engine that reads the markup.  In the case of a Video showing Angkor Wat, a temple complex in Angkor, Cambodia, the subject is both a Place and a ManMadeObject (in this case a Temple) and the delivery format is Video.
>> 
> At the level of the vocabulary, it is already possible to expose information about multiple entities, so I may not understand your proposal correctly.
> If you are referring to the problem that getting Rich Snippets is difficult if you mark-up multiple types of entities, then this is a different issue.
> While this forum is not about the actual usage of schema.org by Google or any other search engine, and while I clearly do not claim to speak on behalf of any single search engine, the problem is that Rich Snippets or any other technique for summarizing page content for previews in the organic search results have to condense a whole page *to a single snippet*.
> The currently dominating approach is to implement snippet types organized around a single, dominating type of object - e.g. products or events.
> You may have seen that e.g. Google is sometimes already showing 
> snippet types for pages that contain multiple objects, like the one 
> attached (this one is not based on schema.org markup but other data 
> sources, though),
> 
> [The proposal was initially conceived as an expansion of schema.org 
> and its application by Google, Bing, and Yahoo.  Since it became clear 
> that the search engines would only accept one class for a given 
> resource, we felt it is important to bring up the fact that this does 
> not meet the need since many things by their very nature belong to 
> more than one class.  We believe that these multiple class 
> designations should be recognized by the engines and incorporated into 
> their semantic representation of the Webpage content.]
> 
> Sincerely,
> 
> Paul
> 

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/

Received on Tuesday, 19 November 2013 08:00:51 UTC