Re: Generic Property-Value Proposal for Schema.org

Hi Holger,

On 03 May 2014, at 01:01, Holger Knublauch <holger@topquadrant.com> wrote:

> On 5/3/2014 4:10, Niklas Lindström wrote:
>> My opinion, based on experience in both consuming data and working to unify disparate descriptions, is that, in the general case of needing specific properties beyond the core or schema.org, it would be quite valuable to apply the existing mechanism of mixing vocabularies, native to RDF and the enabler of decentralized vocabulary growth. It has been there from the start and proven extremely valuable in specific data integration scenarios.
> 
> +1
> 
> I believe Martin has raised a very important topic - on how to make it easier for site owners to publish data for which no fixed schema.org extension already exists. Another important topic in his proposal is the handling of units and ranges. But I believe neither of these necessarily require yet another extension mechanism, especially if this extension mechanism breaks the integrity and simplicity of the original data model and its mapping to triples.

My proposal is neither an extension mechanism nor does it break the integrity of any data. It is a mechanism for exposing product feature data without curing the data prior to publication, while preserving as much data structure and core semantic aspects as can be easily provided by the site.

I am not trying to sidestep RDF, OWL, or any other technologies that require a slightly higher level of homogeneity to operate.

And the mapping to triples ist just one of may angles of looking at schema.org. Most people who add schema.org to sites have no clue what triples are.

> URI-based property definitions similar to RDF/OWL can already be used, and anyone can very easily create their own property my:property. If microdata has a problem with that because it is optimized for a single namespace then the extension should be defined for microdata *syntax* only, but not for the general schema.org *data model*.
> 
> On the topic of units and ranges, I see no reason why these should only apply to "additional" properties. They make a lot of sense for existing schema.org types too, such as schema:height should also be able to have values such as "10 - 500 cm". So this problem should be addressed separately.

The problem of units and ranges IS properly addressed in schema.org by virtue of http://schema.org/QuantitativeValue, which is derived from http://purl.org/goodrelations/v1#QuantitativeValue and allows all the nice things that are shown by Dean Allemang and James Hendler in the second edition of http://workingontologist.org/. This is not yet sufficiently documented, but it is there.
> 
> I believe the original issue of making it easier for site owners to publish data could be addressed by preparing suitable documentation and quick start guides with the recommended design patterns.

The design pattern will be essentially 

     https://code.google.com/p/templates4goodrelations/source/browse/template-shop_vocabulary.html

with a few changes in naming. This is a huge burden for site owners and adds almost nothing for consumers of data.


> If additionalProperty is still needed, then please only for microdata syntax.
> 

Sorry for saying this, but I think you are looking at this too much from a data consumer's perspective. additionalProperty is NOT a Microdata-specific proposal. It is a mechanism for exposing semi-structured property-value pairs for product and places features in a way that 

1. allows (but does not enforce) preserving core conceptul distinctions, like value vs. unit, unit code vs. unit as a text, point values vs. ranges, and open intervals
2. allows (but does not enforce) preserving cues to the semantics of properties in various, frequently available forms, like vendor-specific codes, entries in glossary pages, codes from established standards.

Even in RDFa and JSON-LD, there is no standard way of preserving these varying degrees of precision in the data. And moreover, you are imposing a HUGE burden on site owners.

Of course, conceptually, you can ask them to define properties locally and use them by reference, effectively making every site a provider of an ontology specification. But this adds little to the data processing, but puts effort on the publisher.

In essence you say that site owners should either adhere to Semantic Web patterns, or shut up / omit meta-data for such patterns.

By the way, the proposal is not the invention of a minute. I have tried the other round for almost a decade, and likely, with all due modesty, with more effort than anybody else in the SemWeb community: building and deploying rich OWL vocabularies for product types and features, like all these:

   http://wiki.goodrelations-vocabulary.org/Vocabularies

and encouring companies to use them in their markup.

And for proprietary product types and product properties, we tried to foster the local creation of type and property definition on a per shop basis, as in these templates

    https://code.google.com/p/templates4goodrelations/

namely

    https://code.google.com/p/templates4goodrelations/source/browse/template-shop_vocabulary.html


This takes dictionaries of shop property and class information and generates a pages with all the local type and property definitions, which can then locally references in product instance data.

The problem is that this is awfully complicated for the average Web developer. It's a lot of markup that serves no obvious purpose for the publisher and can be reconstructed by any reasonable consumer anyway. It just did not gain any traction.

Mike Bergman has called this "supreme arrogance of position", but I stick to the prediction that if you do not drastically simplify the way e-commerce players can publish structured data for product features, we will still be at the level of simple price and product name information, motivated by rich snippets, in five years from now.

To take up the "arrogance of position" claim: I think it is a popular arrogance of position of the Semantic Web community (excplicitly: not yours!!!) to look at the schema.org ecosystem as if it was solely an RDF landscape. For sure, it is nice and valuable that one can consume schema.org in RDF world, but this is just one of many perspectives. The essence is a nice data-model that is pretty independent of syntaxes, works well at Web scale, and can be implemented with moderate effort by standard Web developers.

I find it surprising (or enlightning), that the heftiest opposition against this proposal comes from Semantic Web advocates, while the early adopters of schema.org in this round almost entirely support it.

Just for the record: I created the boilerplate page for the proposal in January but did not add any significant content until April 28. Until then, some 2000 casual visitor had looked at the page.

Now, after less than a week, the proposal page has been accessed almost 12,000 times. There is interest in this by people who may not be raising their voice.

Martin









Best wishes / Mit freundlichen Grüßen

Martin Hepp

-------------------------------------------------------
martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  martin.hepp@unibw.de
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
=================================================================
* Project Main Page: http://purl.org/goodrelations/

Received on Friday, 2 May 2014 23:40:42 UTC