- From: Jason Douglas <jasondouglas@google.com>
- Date: Wed, 30 Apr 2014 23:28:25 +0000
- To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, "martin.hepp@ebusiness-unibw.org" <martin.hepp@ebusiness-unibw.org>
- Cc: W3C Web Schemas Task Force <public-vocabs@w3.org>
- Message-ID: <CAEiKvUAFjuAnDL68svdJg423UCG2V6UNpT2c-mrPPrZz1dYHkA@mail.gmail.com>
If this is just for product specs, then why not propose the denormalization at that constrained level rather than as a global concept? I believe the sports working group was considering something similar for "sports statistic." -jason On Wed Apr 30 2014 at 4:04:35 PM, Peter F. Patel-Schneider < pfpschneider@gmail.com> wrote: > > On 04/30/2014 01:43 PM, martin.hepp@ebusiness-unibw.org wrote: > > Peter: > > On 29 Apr 2014, at 15:47, Peter F. Patel-Schneider < > pfpschneider@gmail.com> wrote: > > > >> There appears to be quite a lot here. As far as I can tell, the > essence is to have a special property whose values are some sort of > structure that represents some sort of pair of some sort of relationship > and some sort of value. > > Yes. It is about providing a mechanism that allows site owners to expose > core meta-data for their content, even if they cannot lift their data to a > higher degree of formality. > > > >> The fly in this ointment is in all the "some sort"s above. > > This is a design feature, not a bug, same as ambiguity in human > languages is often a feature, not a bug. We allow sites to speak in data > even if they cannot speak Oxford English. > > I firmly believe that this *is* a bug. I don't see any significant > advantage > of this proposal over allowing the attachment of RDB-style tables to > entities. Consumers will have to handle a wide variety of "columns" with > little or no commonality between information coming from different sources. > > Sure, if you have considerable resources, you may be able to make sense of > the > heterogeneity, but I thought that the idea behind schema.org was to put > some > homogeneity on information, i.e., precisely to move away from the difficult > aspects of human languages. > > > >> How are consumers of this information supposed to treat it? For > example, what happens when there are multiple values, or the value doesn't > fit within the min and max, or there are any number of situations that do > not fit within the simplecases? > > They will have to post-process this "proto-data" and apply a lot of > heuristics, machine learning, NLP to lift the raw data to the data they use > for the final purpose. This is the very nature of processing data from Web > markup at scale, see my post on "proto-data", > http://lists.w3.org/Archives/Public/public-vocabs/2013Oct/0293.html. > > > > But if Web sites are able to expose the core meta-data for such data, > like > > > > - the name of the propery > > - the value > > - the unit > > - some hint of a standard that defines this property > > > > this is already a huge improvement over the state of the art. > > > I just don't see the advantage here. Maybe there will be commonalities, > but > then surely the way forward is to put these commonalities into schema.org. > > > > >> There are several examples on the proposal page (look intervals and > ranges) that don't fit within the simple cases, showing how easy it is to > slip outside the simple cases. > >> > > With mark-up at Web scale, there is no black-and-white view of what is > inside and outside the intended cases. > > Umm. I said "simple", not "intended". The point here is that if even the > early examples slip into cases where the data values include non-formal > aspects, then the consumer processing is going to be very messy and error > prone. > > > > As a side remark: > > > > I have spent the last ten years with building product ontologies in OWL > DL that extend GoodRelations by classes and properties, in total more than > 40 such ontologies, see http://wiki.goodrelations- > vocabulary.org/Vocabularies, with 40,000 classes and maybe 20,000 > properties. They are perfect for a data consumer, and they are used in > applications. However, we have not been able to convince site-owners at > scale to use such vocabularies for mark-uping up their content. The main > reason for that is that they have a very, very hard time lifting and > cleansing their data to that level of formality. > > Then let's stick to scraping web pages. > > > > > Martin > > > peter > > > > >> peter > >> > >> > >> On 04/29/2014 02:42 AM, martin.hepp@ebusiness-unibw.org wrote: > >>> Dear all: > >>> > >>> I have just finalized a proposal on how to add support for generic > property-value pairs to schema.org. This serves three purposes: > >>> > >>> 1. It will allow to expose product feature information from thousands > of product detail pages from retailers and manufacturers. > >>> 2. It will simplify the development of future extensions for specific > types of products and services, because we do no longer need to standardize > and define all relevant properties in schema.org and can instead defer > the interpretation to the client. > >>> 3. It will serve as a clean, generic extension mechanism for > properties in schema.org > >>> > >>> The proposal with all examples is here: > >>> > >>> https://www.w3.org/wiki/WebSchemas/PropertyValuePairs > >>> > >>> Your feedback will be very welcome. > >>> > >>> Best wishes / Mit freundlichen Grüßen > >>> > >>> Martin Hepp > >>> ----------------------------------- > >>> martin hepp http://www.heppnetz.de > >>> mhepp@computer.org @mfhepp > >>> > >>> > >>> > >>> > >>> > >> > > >
Received on Wednesday, 30 April 2014 23:28:57 UTC