schema.org and proto-data, was Re: schema.org as reconstructed from the human-readable information at schema.org

Peter:
Note that schema.org sits between millions of owners of data (Web masters) and large, centralized consumers of Big Data, who apply hundreds of heuristics before using the data.
Schema.org is an interface between Webmaster minds, data structures in back-end RDBMS driving Web sites, and search engines (and maybe other types of consumers).

The whole environment heavily relies on 
1. probabilistic processing
2. the quality of the understanding between the many minds (developers) in this eco-system.

Traditional measures of conceptual clarity and guarantees / deterministic data processing are of very limited relevance in that setting.

For instance, if you introduce a conceptual distinction which is very valuable and justified from an experts perspective, this may often not lead to more reliable data processing, since the element may be used more inconsistently among Web developers (or the distinctions may not be reliable represented in the databases driving the sites).

Looking at schema.org from the perspective of knowledge representation in the traditional sense is insufficient, IMO. YOu have to look at the data ecosystem as a whole.

Information exposed using schema.org meta-data is what I would call proto-data, not ready for direct consumption by deterministic computational operations. 

Martin



On Oct 25, 2013, at 6:37 AM, Peter F. Patel-Schneider wrote:

> Strangenesses in schema.org, an incomplete list:
> 
> Types as URLs.  Properties as strings.  Prescriptive property introductions.  Closed set of types, particularly with open set of properties.  Union ranges, particularly with sub and super properties.  Single typing with a multiple-parent type hierarchy.  URLs as a subset of text. URl vs sameAs property.  additionalTypes property.
> 
> peter
> 
> 
> On Oct 24, 2013, at 8:54 PM, Guha <guha@google.com> wrote:
> 
>> Well, we certainly have a fair number of websites that don't consider it too strange to use. And they seem to think they are using it for real. 
>> 
>> I am now curious about what you find strange about it.
>> 
>> guha
>> 
>> 
>> On Thu, Oct 24, 2013 at 8:26 PM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>> The purpose of this exercise was mostly curiosity in the end, at least after I discovered all the strangenesses, but certainly started with desire to use schema.org information for real.
>> 
>> peter
>> 
>> On Oct 24, 2013, at 6:31 PM, Guha <guha@google.com> wrote:
>> 
>> > Mostly right. See below for corrections. What is the purpose of this 'reconstruction', if I may ask?
>> >
>> > guha
>> 
>> 
> 

--------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/

Received on Monday, 28 October 2013 09:47:32 UTC