Re: Generic Property-Value Proposal for Schema.org from martin.hepp@ebusiness-unibw.org on 2014-04-30 (public-vocabs@w3.org from April 2014)

From: <martin.hepp@ebusiness-unibw.org>
Date: Thu, 1 May 2014 01:35:44 +0200
To: Jason Douglas <jasondouglas@google.com>
Cc: W3C Web Schemas Task Force <public-vocabs@w3.org>
Message-Id: <1F007EFA-A14E-4AEA-9632-DEC99D9004A7@ebusiness-unibw.org>
Hi Jason:

I could live with having this at the position of http://schema.org/Product, and maybe gradually expanding the domain of additionalProperty to relevant other types on demand. Personally I think that having a generic extension mechanism at the level of http://schema.org/Thing is a bit more appealing, but I would not have a problem with starting at a deeper branch and then seing how it develops in the wild.

Dan, Guha - do you have any opinion on this?

Best

Martin

On 01 May 2014, at 01:28, Jason Douglas <jasondouglas@google.com> wrote:

> If this is just for product specs, then why not propose the denormalization at that constrained level rather than as a global concept?
> 
> I believe the sports working group was considering something similar for "sports statistic."
> 
> -jason
> 
> On Wed Apr 30 2014 at 4:04:35 PM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> 
> On 04/30/2014 01:43 PM, martin.hepp@ebusiness-unibw.org wrote:
> > Peter:
> > On 29 Apr 2014, at 15:47, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> >
> >> There appears to be quite a lot here. As far as I can tell, the  essence is to have a special property whose values are some sort of structure that represents some sort of pair of some sort of relationship and some sort of value.
> > Yes. It is about providing a mechanism that allows site owners to expose core meta-data for their content, even if they cannot lift their data to a higher degree of formality.
> >
> >> The fly in this ointment is in all the "some sort"s above.
> > This is a design feature, not a bug, same as ambiguity in human languages is often a feature, not a bug. We allow sites to speak in data even if they cannot speak Oxford English.
> 
> I firmly believe that this *is* a bug.  I don't see any significant advantage
> of this proposal over allowing the attachment of RDB-style tables to
> entities.  Consumers will have to handle a wide variety of "columns" with
> little or no commonality between information coming from different sources.
> 
> Sure, if you have considerable resources, you may be able to make sense of the
> heterogeneity, but I thought that the idea behind schema.org was to put some
> homogeneity on information, i.e., precisely to move away from the difficult
> aspects of human languages.
> >
> >> How are consumers of this information supposed to treat it? For example, what happens when there are multiple values, or the value doesn't fit within the min and max, or there are any number of situations that do not fit within the simplecases?
> > They will have to post-process this "proto-data" and apply a lot of heuristics, machine learning, NLP to lift the raw data to the data they use for the final purpose. This is the very nature of processing data from Web markup at scale, see my post on "proto-data", http://lists.w3.org/Archives/Public/public-vocabs/2013Oct/0293.html.
> >
> > But if Web sites are able to expose the core meta-data for such data, like
> >
> > - the name of the propery
> > - the value
> > - the unit
> > - some hint of a standard that defines this property
> >
> > this is already a huge improvement over the state of the art.
> 
> 
> I just don't see the advantage here.  Maybe there will be commonalities, but
> then surely the way forward is to put these commonalities into schema.org.
> 
> >
> >> There are several examples on the proposal page (look intervals and ranges) that don't fit within the simple cases, showing how easy it is to slip outside the simple cases.
> >>
> > With mark-up at Web scale, there is no black-and-white view of what is inside and outside the intended cases.
> 
> Umm. I said "simple", not "intended".  The point here is that if even the
> early examples slip into cases where the data values include non-formal
> aspects, then the consumer processing is going to be very messy and error prone.
> >
> > As a side remark:
> >
> > I have spent the last ten years with building product ontologies in OWL DL that extend GoodRelations by classes and properties, in total more than 40 such ontologies, see http://wiki.goodrelations-vocabulary.org/Vocabularies, with 40,000 classes and maybe 20,000 properties. They are perfect for a data consumer, and they are used in applications. However, we have not been able to convince site-owners at scale to use such vocabularies for mark-uping up their content. The main reason for that is that they have a very, very hard time lifting and cleansing their data to that level of formality.
> 
> Then let's stick to scraping web pages.
> 
> >
> > Martin
> 
> 
> peter
> 
> >
> >> peter
> >>
> >>
> >> On 04/29/2014 02:42 AM, martin.hepp@ebusiness-unibw.org wrote:
> >>> Dear all:
> >>>
> >>> I have just finalized a proposal on how to add support for generic property-value pairs to schema.org. This serves three purposes:
> >>>
> >>> 1. It will allow to expose product feature information from thousands of product detail pages from retailers and manufacturers.
> >>> 2. It will simplify the development of future extensions for specific types of products and services, because we do no longer need to standardize and define all relevant properties in schema.org and can instead defer the interpretation to the client.
> >>> 3. It will serve as a clean, generic extension mechanism for properties in schema.org
> >>>
> >>> The proposal with all examples is here:
> >>>
> >>>      https://www.w3.org/wiki/WebSchemas/PropertyValuePairs
> >>>
> >>> Your feedback will be very welcome.
> >>>
> >>> Best wishes / Mit freundlichen Grüßen
> >>>
> >>> Martin Hepp
> >>> -----------------------------------
> >>> martin hepp  http://www.heppnetz.de
> >>> mhepp@computer.org          @mfhepp
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> 
>
Received on Wednesday, 30 April 2014 23:36:09 UTC