Re: Generic Property-Value Proposal for Schema.org

Dear Eric:

Thanks!

On 01 May 2014, at 22:21, Eric Kauz <eric.kauz@gs1.org> wrote:

> Hi Martin,
> 
> In regards to you comment on standards bodies not publishing  web ontology versions of its classes and properties, for several years, the GS1 community [1] (representing over 1 million companies worldwide) has been working together in a consensus-based standardisation process to define product attributes, resulting in the GS1 Global Product Classification (GPC) system [2] and the data model for Global Data Synchronization (GDSN) [3].  GDSN provides business-to-business synchronisation of master data about products, between brand owners / manufacturers and retailers across the supply chain networks. The GPC dataset and GDSN data model are already freely available open standards, with downloadable XML artefacts [4] [5] - although we have not yet transformed them into a Linked Data representation.  

Note that my team member Alex Stolz has developed a free tool that generates a fully GoodRelations-compliant, Linked-Open-Data version of GPC, ready for deployment with one line of Python:

    http://wiki.goodrelations-vocabulary.org/Tools/PCS2OWL

> 
> As part of our current GS1 project called GTIN+ on the Web [6] we are developing a Linked Data ontology around the product characteristics that we have already defined in our Global Data Dictionary. We are committed [7] to providing a free online Linked Data representation of GPC as well as a GS1 Linked Data vocabulary that can be used for describing product characteristics in detail (both qualitatively and quantitatively) and aligned with the precise definitions that the GS1 community has developed over a number of years (in GDSN and other work groups), based on the expertise of our member companies who have significant domain knowledge about product specifications, characteristics and classifications. We have been getting further involved in W3C Semantic Web efforts to avoid duplication of efforts.

That is good news! When publishing the GPC, it would be very good to make sure it remains a valid GoodRelations extention, adhering to the core conceptual distinctions that are also part of schema.org now. The recipe for that is here:

    http://wiki.goodrelations-vocabulary.org/Documentation/Extensions

But as said, the PCS2OWL Python script does that for GPC in less than a minute, ready for publication. You just have to define your namespace and upload the result of the conversion to your server via FTP.

Please contact me bilaterally if you need any support.

> We see some value in the generic property-value proposal for providing a convenient extension mechanism to support additional product characteristics that are not yet formally represented by predicates or properties in established Linked Data vocabularies such as schema.org - in particular to identify priority areas for formal standardisation of product properties in Linked Data vocabularies.
> 
> However, we also consider that this approach may be sub-optimal relative to defining one or more controlled vocabularies with HTTP URIs for important product characteristics, since an HTTP URI for a property has the advantage of providing a single unambiguous way of referring to the property, whereas free-form text strings will probably require some fuzzy matching techniques if different organisations use different text strings (possibly in different human languages) to express such property names; in contrast an HTTP URI is a globally unambiguous identifier that can support and link online to multi-lingual labels, descriptions and definitions of the property, whereas a user-defined bare text string is probably even less useful than a URN.

You may know that I have worked in the field of product classification standards for more than a decade, see e.g. [1] and [2], with [1] being the first comprehensive quantitative evaluation of the coverage of such standards to my knowledge.

We all agree in here that standards for product types and properties and values are easier to process and in general better. However, the development and use of such standards is orthogonal to the proposed extension. The extension allows sites to expose product feature data if any of the following condition is met:

1. There is no standardized property for the product characteristic (e.g. number of birds per cuckoo clock).

2. There is no URI / Web vocabulary for the standardized property (as is the case for eClass properties, UNSPSC classes, and was to my knowledge for GPC).

3. The site is not able to add the URI of the property to the data (e.g. because the back-end systems just provide property-value pairs to the Web application).

4. The data available to the site is not sufficiently granular or incomplete and thus insufficient for populating the target data structure as specified in the product ontology (e.g. the ontology defines a product weight including batteries and the site data is weight excluding batteries).

With the proposed extension, sites can publish a wealth or product data even in such cases, which are very, very common.

> 
> Of course there is a challenge to identify the most important characteristics to support in each product category and how to prioritise this work in a phased approach.  However, we need not start from a blank sheet of paper and can build on existing classification systems (such as GPC, UNSPSC, eClass etc., together with experience from mapping initiatives such as cMap [8]) and the GDSN data model, as well as the combined experience and domain knowledge of a large community of manufacturers, brand owners and retailers, who are now expressing significant interest in expressing rich structured master data about products as Linked Open Data via web pages.  Some of our members are already proceeding to piloting and are keen to deploy this and reap the benefits in terms of enhanced search listings, increased visibility on the web of products and product offerings and enabling a new ecosystem of consumer-facing product-centric information and services, accessible via smartphones and other mobile devices.

I am happy to see the standardization bodies now trying to lift their standards to the Web of Data. Ever since deploying the first OWL DL version of eClass (http://www.heppnetz.de/projects/eclassowl/), I have been trying to develop the technology and legal framework to do so.

However, while this is valuable, I think it is not sufficient to address the needs of sites that want to expose structured product feature information. As we have shown in [1], the coverage of the standards is very limited compared to the breadth of products and services that are offered on the Web. See also [3] and [4].

I may be misinterpreting the intent of your message, but if you said that we should not implement the proposed extention for schema.org because the GPC will tackle the same problem in the future, I have to contradict, because of the four barriers indicated above, you only address #2 and to a certain extent #1 (but only for the types of products standardized in the GPC).

Personally I think that the proposed mechanism and richer product ontologies, be it a Linked Open Data version of the GPC or any of the GoodRelations-based ontologies from [5], will work hand in hand:

1. Sites will mostly publish lightweight feature data based on http://schema.org/PropertyValue.
2. Consumers of will cleanse and lift the data depending on their ability and locally represent them using the richer product ontologies.


Best wishes

Martin

[1] Hepp, Martin; Leukel, Joerg; Schmitz, Volker: A Quantitative Analysis of Product Categorization Standards: eCl@ss, UNSPSC, eOTD, and RNTD, in: Knowledge and Information Systems (KAIS), Springer, Vol. 13, No. 1 (September 2007), pp. 77-114.

[2] Hepp, Martin: E-Business Vocabularies as a Moving Target: Quantifying the Conceptual Dynamics in Domains, Proceedings of the 16th International Conference on Knowledge Engineering and Knowledge Management (EKAW2008), September 29 - October 3, 2008, Acitrezza, Italy, Springer LNCS, Vol. 5268, pp. 388-403.

[3] Hepp, Martin; Siorpaes, Katharina; Bachlechner, Daniel: Harvesting Wiki Consensus: Using Wikipedia Entries as Vocabulary for Knowledge Management, IEEE Internet Computing, Vol. 11, No. 5, pp. 54-65, Sept-Oct 2007.

All PDFs are available from http://www.heppnetz.de/publications/

[4] http://www.productontology.org


[5] http://wiki.goodrelations-vocabulary.org/Vocabularies

Received on Friday, 2 May 2014 21:54:13 UTC