W3C home > Mailing lists > Public > public-vocabs@w3.org > April 2014

Re: Generic Property-Value Proposal for Schema.org

From: <martin.hepp@ebusiness-unibw.org>
Date: Wed, 30 Apr 2014 22:21:31 +0200
Cc: W3C Web Schemas Task Force <public-vocabs@w3.org>
Message-Id: <8743D67B-7E66-4FCE-BD4F-DE20697E7A58@ebusiness-unibw.org>
To: Jarno van Driel <jarno@quantumspork.nl>
Hi Jarno,

On 29 Apr 2014, at 13:12, Jarno van Driel <jarno@quantumspork.nl> wrote:

> Let me first say I like the generic extension mechanism as a whole as well as the property name 'additionalProperty'. It seems obvious in regards to how to use. 

Thanks! That is good to hear.

> Having quite some experience with eCommerce sites myself though, I also see a some disadvantages for website owners. Especially for those who have a site which sells many different types of products and get their product info from supplier feeds or datasheets that generally don't contain this type of data. Meaning that if a site-owner would want publish this type of data, quite a lot of code development (and thus resources) would be needed. And I suspect quite a lot of daily man hours as well, since it's the type of data that's difficult to markup automated within a template system.

Please note that the proposal is just about providing the conceptual elements for marking up such data. It does not imply that one has to do so. The purpose is to provide a mechanism that allows sites to expose granular product feature data and similar things if they want to do so and if they have the resources.

Also, the typical effort will just be to add a little bit of markup to one or two page type templates, typically the product detail pages. If the tables or div structures are generated using a loop in a template language like Jinja, Smarty, or plain PHP/JSP, ..., then we are maybe talking about adding three or four keywords to the template.

Further note that an inherent feature of the proposal is that you can always publish the level of granularity that you have. If you just have a string for the unit of measurement, okay, use unitText. If you have UN/CEFACT codes for the unit, which are less ambiguous and popular in professional e-commerce applications, then better use unitCode - it will be easier for a client to understand what your data means.

If you cannot expose the unit separately from the value, fill the value property with both.

It is all about being able to expose as much data granularity and data semantics as possible quick and easy.

> If a corporation with tens of thousands of products on their site(s) would contemplate doing this, what would there to gain for them?

The initial motivation for the proposal are businesses that have an interest in articulating the exact value proposition of their products. A typical case are the official datasheet pages of manufacturers of commodity products. If all manufacturers of digital cameras, cars, cell phones etc. mark up their product feature information AND provide some reliable identifier for the product model, like the GTIN-13/UPC/EAN code or the MPN with a brand name, then search engines and other consumers can easily combine the product information from the manufacturer pages with the offer information from a retailer, as long as the retailers also use GTIN-13 or MPN+brand. So a manufacturer of a product can effectively enhance Google's etc. understanding for 30,000 dealers product detail pages selling their product. This is a potentially huge lever.

Also note that this is in the perfect interest of the manufacturer, since it helps their sales channels articulate the product features and thus the value proposition more clearly.

Example: If Nikon say a certain camera model has an USB interface, and a dealer page offers this camera, then a search engine will be able to know that this dealer page is a good hit for the query "Niko camera USB deals"

This very same pattern is already working for many industries, with the main difference that the product master data has to licensed from specialized data providers, like etilize.com. This, however, does not work well on the long tail or for non-consumer products, and is unfeasible for many smaller consumers of e-commerce data.

So for any business that wants to maximize the degree to which potential customers understand and consider the features of their products and services, structured data is an additional, promising channel.

For people who live from selling such data, this is likely less attractive, but nobody forces them to use the proposed pattern.

Now, for a Web shop selling commodities, it will depend on whether it makes sense to mark-up product features. It will still help search engines understand and consider your content, but if the same data is already available from the manufacturer's site, and if the search engine is able to discover the link between your offer and the product entity, then it may be redundant. In worst case, you may have a lot of effort publishing the data, and competitors could harvest it more easily. In essence, it will be a strategic decision. But people overestimate the risk that others steal their information based on structured data markup in most cases. Even today, it is cheap and easy to use crowdsourcing services to extract any data from HTML Web content. Crawling and extracting data from your competitor is likely more costly.

But in a nutshell: The proposal is about providing a mechanism for exposing such data in cases where this is desirable. It is not a mandatory feature.

> And from the perspective of data-consumers, e.g. schema.org's sponsors or price-comparison sites: what are the advantages of having access to this type of non-standardized data. Doesn't the fact that it's non-standardized also mean it's data that's hard to process, especially when comparing specific product specifications?

I am not in a position to speak for any of them, but the essential rationale is the following:

1. Such data has a huge lever for commodities that have some kind of reliable identifier (e.g. GTIN-13, EAN, UPC, MPN), because if you properly extract it from one page, you can use it to better understand tens of thousands of pages that offer these products.

2. You can already do much with the data even without fully understanding the semantics. For instance, you can automically generate faceted search interfaces based on the names of the most popular property names and use simple heuristics (e.g. string distant metrics) for consolidating spelling variants. Also, you will be able to do a lot of NLP on the property names once you know they are property names (e.g. translate them).

3. If the meta-model preserves some basic structures, you can use heuristics much more efficiently. For instance, the character " in a value likely means inch. In plain text, it is much harder to consider unit information properly.

But clearly, from a data consumer's perspective, it would be much better if everybody used the full power of product features from the GoodRelations core model with the 40+ available extensions. The problem is that this is too difficult for most of them.


> On Tue, Apr 29, 2014 at 11:42 AM, martin.hepp@ebusiness-unibw.org <martin.hepp@ebusiness-unibw.org> wrote:
> Dear all:
> I have just finalized a proposal on how to add support for generic property-value pairs to schema.org. This serves three purposes:
> 1. It will allow to expose product feature information from thousands of product detail pages from retailers and manufacturers.
> 2. It will simplify the development of future extensions for specific types of products and services, because we do no longer need to standardize and define all relevant properties in schema.org and can instead defer the interpretation to the client.
> 3. It will serve as a clean, generic extension mechanism for properties in schema.org
> The proposal with all examples is here:
>     https://www.w3.org/wiki/WebSchemas/PropertyValuePairs
> Your feedback will be very welcome.
> Best wishes / Mit freundlichen Grüßen
> Martin Hepp
> -----------------------------------
> martin hepp  http://www.heppnetz.de
> mhepp@computer.org          @mfhepp
Received on Wednesday, 30 April 2014 20:22:01 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:39 UTC