- From: Niklas Lindström <lindstream@gmail.com>
- Date: Sat, 3 May 2014 02:12:38 +0200
- To: "martin.hepp@ebusiness-unibw.org" <martin.hepp@ebusiness-unibw.org>
- Cc: Dan Brickley <danbri@google.com>, Jason Douglas <jasondouglas@google.com>, Aaron Bradley <aaranged@gmail.com>, "kevin.polley" <kevin.polley@mutualadvantage.co.uk>, W3C Web Schemas Task Force <public-vocabs@w3.org>, Jay Myers <jay.myers@bestbuy.com>, Mike Bergman <mike@mkbergman.com>
- Message-ID: <CADjV5jcX5YreBEvrUmg6Nv3dnyjhOHK3yFjzC=4EOLfnRV_a-A@mail.gmail.com>
Hi Martin, On Fri, May 2, 2014 at 10:37 PM, martin.hepp@ebusiness-unibw.org < martin.hepp@ebusiness-unibw.org> wrote: > Dear Niklas, > > On 02 May 2014, at 20:10, Niklas Lindström <lindstream@gmail.com> wrote: > > > Hi all, > > > > I do understand the case for capturing structured values describing > special properties of products. The way proposed certainly makes for less > normalized, precise and reusable data than simple properties with direct > values enable. But it has some merit in the fact that, as Martin says, a > little data goes a long way. > > Thanks! > > > > The shape of data that this produce merits some analysis. For one, the > idea to some extent resembles a mix of statement reification and structured > values. More interestingly though, the pattern is similar to an effect that > can be achieved by defining very specific SKOS concepts (for e.g. battery > types, operating systems, screen sizes and bluetooth types), and linking to > them with e.g. a productSpecification property. To me, these PropertyValue > entities really look like a free-form version of such concept/topic/enum > entities, with their plain text names representing their "type", rather > than a generic property extension (in the RDF sense). And I see the > potential in that. > > Yes, one could say that my proposal is similar to "SKOS for properties" ;-) > But after the long discussion this has triggered, I would like to downplay > the proposal to the very tangible area of application for product > properties and places properties. I am personally convinced that the > pattern is of generic value, for it strikes a balance between preserving > data structure and data semantics while minimizing the effort for a data > publisher. But let's separate this aspect. > I see some generic value as well. But I think I see it from a slightly different angle. I find the notion of naming a structured value quite usable. (And I try not to think too much about the relative merit of that in relation to using dedicated properties with simple values, or to the value of datatyped RDF values (available in RDFa and JSON-LD but not in microdata). In the sense of a SKOS-like pattern, it can evolve on its own.) What I find problematic is the very generic but somewhat contrived notion of additionalProperty, in combination with the split nature of the PropertyValue notion. It feels a bit mixed up. If instead, the structured value type was called NamedValue, and the property was called e.g. specification (applicable for products and places, and maybe other things), it seems more natural to me. I did a search/replace for all the examples in your proposal, and I find the results quite intuitive (using Vim: %s/additionalProperty/specification/g | %s/PropertyValue/NamedValue/g). What do you think? > > > > (And although such ambiguous data can be hard to collate (and > translate), it has its place, just as plain text keywords for basic website > SEO have, in a primitive fashion. External enumerations (using e.g. SKOS) > are often far more usable and scalable in the long term though.) > > > Actually, I think that processing the resulting data is less difficult as > many assume, since from an NLP perspective, the space of possible > interpretations is much smaller, and you will have a lot of contextual > information. But the consumption of the data should not be our main concern > at this point > Yes, it's true that the structure limits the complexity of processing the text (though different languages are always tricky). Also, if we're not seeking to express implicit properties in the values (in the RDF sense), but to use structured values with a contextual name, it might be easier to collate that information (since such values are more self-contained). And quite possibly easier for publishers to see when it is applicable. In general, a problem in our discussion has been that the perspective of > data publication and data consumption have been mixed. Of course we all > agree that the resulting data is more effort to process than standardized > properties, compare > > Ideal Version: External Property with Qualitative Value > > <div itemscope itemtype="http://schema.org/Product"> > <span itemprop="name">ACME Electric Anvil</span> > ... > Operating Voltage: <div itemprop="http://acme.org/vocab/#voltage" > itemscope > itemtype="http://schema.org/QuantitativeValue"> > <span itemprop="minValue">100</span>- > <span itemprop="maxValue">220</span> > <meta itemprop="unitCode" content="VLT" > V > </div> > > with this > > Variant 1: Property name instead of URI > > <div itemtype="http://schema.org/Product"> > <span itemprop="name">ACME Electric Anvil</span> > <div itemprop="additionalProperty" itemscope itemtype=" > http://schema.org/PropertyValue"> > <span itemprop="name">Operating Voltage</span> > <span itemprop="minValue">100</span>- > <span itemprop="maxValue">250</span> > <meta itemprop="unitCode" content="VLT"> V > </div> > </div> > > or this > > Variant 2: Unit as text instead of UN/CEFACT Common Code and range as a > single field > > > <div itemtype="http://schema.org/Product"> > <span itemprop="name">ACME Electric Anvil</span> > <div itemprop="additionalProperty" itemscope itemtype=" > http://schema.org/PropertyValue"> > <span itemprop="name">Operating Voltage</span> > <span itemprop="value">100-250</span>- > <span itemprop="unitText">V</span> > </div> > </div> > > or in worst case this: > > Variant 3: Range and Unit in a joint field > > <div itemtype="http://schema.org/Product"> > <span itemprop="name">ACME Electric Anvil</span> > <div itemprop="additionalProperty" itemscope itemtype=" > http://schema.org/PropertyValue"> > <span itemprop="name">Operating Voltage</span> > <span itemprop="value">100-250 V</span>- > </div> > </div> > > > It is obvious that the version with a dedicated property URI and a proper > http://schema.org/QuantitativeValue node is easier to process. > > But from a data provider's perspective, who typically has the product > properties in very light-weight property-value structures, with often > proprietary properties, even the step to Variant 1 makes data publication > much, much simpler, because he does not have to map the local property name > to a standard property URI nor determine the type of the value > (quantitative, qualitative, or Boolean). That is VERY difficult from > typical Web applications, even if the back-end systems (PDM/PIM) had this > additional data. > > From a data consumer's perspective, however, even the lightest version > > <div itemtype="http://schema.org/Product"> > <span itemprop="name">ACME Electric Anvil</span> > <div itemprop="additionalProperty" itemscope itemtype=" > http://schema.org/PropertyValue"> > <span itemprop="name">Operating Voltage</span> > <span itemprop="value">100-250 V</span> > </div> > </div> > > is still much easier to consume and lift than > > <div itemtype="http://schema.org/Product"> > <span itemprop="name">ACME Electric Anvil</span> > <div> > <span>Operating Voltage</span> > <span>100-250 V</span> > </div> > </div> > > And I expect that most sites could easily reach the level of Variant 1 or > Variant 2. > Yes, I certainly agree that it lowers the barrier for publishers. And I do agree that you should consider the different perspectives and needs of production and consumption, to avoid confusing them. But I still think they need to be considered in conjunction. Given the renaming suggestion above, a combination of the "ideal" and Variant 1 doesn't look too strange either: <div itemtype="http://schema.org/Product"> <span itemprop="name">ACME Electric Anvil</span> <div itemprop="specification http://acme.org/vocab/#voltage" itemscope itemtype="http://schema.org/NamedValue"> <span itemprop="name">Operating Voltage</span> <span itemprop="minValue">100</span>- <span itemprop="maxValue">250</span> <meta itemprop="unitCode" content="VLT"> V </div> </div> My point here is that a notion of a NamedValue isn't so much on a collision course with external properties as PropertyValue seems to be. In this fashion, they become more clearly complementary. In fact, as an addition to the proposed renaming, I'd suggest to leave out the propertyID at this point. (It makes the proposal simpler. You can already provide a property with an URI as per above. And for the eclass81 prefix, I doubt that creating *another* prefix mechanism is such a good idea.) > > That said, I would be *very* cautious of promoting this shape as a > property extension mechanism. That would be done at the expense of using a > mix of vocabularies for specialized data. > > As already said, I am perfectly fine with postponing this possibility to > the future and constraining this to Product and Place for the moment. > > > > My opinion, based on experience in both consuming data and working to > unify disparate descriptions, is that, in the general case of needing > specific properties beyond the core or schema.org, it would be quite > valuable to apply the existing mechanism of mixing vocabularies, native to > RDF and the enabler of decentralized vocabulary growth. It has been there > from the start and proven extremely valuable in specific data integration > scenarios. Of course, it has the downside of enabling, in Richard's words, > a "cacophony of multiple vocabulary choices". But by grounding basic common > terms in schema.org, we have one stable core around which other things > can revolve and evolve. > > > RDFa has great support for this, especially in compact form through the > means of prefixes. But all of RDFa, JSON-LD and microdata support using > full URIs as property names, so it can be catered for in general. (Also, > JSON-LD has more powerful support through both prefix and direct term > definitions in a context.) > > > > Yes, I am perfectly fine with this. If site-owners are able to publish > data according to external product ontologies, like the 40+ we developed > for GoodRelations, or the new GPC ontology, that is great. And my proposal > does not aim at stealing this opportunity. > > However, note that there are THREE bottlenecks with using external > vocabularies: > > 1. There must be a suitable vocabulary. Yes. The usual chicken-and-egg problem. :) I think data consumers like search engines can drive the evolution of that by looking into usage of external vocabularies though. (Again. I do think the pendulum can slowly turn. Especially now with a core schema.org to base data shapes upon.) 2. Site-owners must be able to map their local data to the external > ontologies and publish respective data. In the past ten years, I have been > able to convince just ONE site to use eClassOWL at broad scale. The problem > is that the data supply for this is quite challenging. > It is definitely an effort to align local data with common models. And even more so to engage in the evolution of such models. Which we are rather aware of here. :) But there is great value in doing that work – you will think more about why the data looks the way it looks, understand your own domain in the eyes of others, and so on. It is very rewarding. But I know it's not always easy to convince organizations of this value, and to thus invest in it. > 3. Most people I speak to basically say that for their production sites, > they do only what is specifified in schema.org. Unless the sponsors of > schema.org explicitly endorse the use of a certain external vocabulary, > this will not have a big adoption, IMO. Adding the proposed elements to > schema.org in contrast will make it much easier to convince owners of > this valuable data to make it available for search engines and other > clients. > Yes, getting some data out is better than nothing. It is still unclear whether search engines will valuably consume and present what's already in schema.org though. Other, specialized services might more quickly make use of specialized properties – and for that, the existing URI mechanism for external properties is great. (A good example of that is WebID.) > This is not a technical issue of course, just a signal. But it will matter. > Absolutely. We're striving to send signals saying that certain patterns are or will be widely understood, consumed and built upon. I just want to make sure that we don't promote something with short-term value at the expense of long-term values which we already have worked out mechanisms for. Though as long as were seeking complementary practises to fill out the gaps, all is good. > As an example, here is what the example product table in the proposal > could look like, when adding an external vocabulary (also capturing some > keywords and using some external enumerations): > > > > <div vocab="http://schema.org/" > > prefix="pto: http://www.productontology.org/id/ > > unit: http://qudt.org/vocab/unit# > > apple: http://apple.com/def/product#"> > > <table typeof="Product pto:IPhone_5"> > > <caption>iPhone 5 Specifications</caption> > > <tr> > > <th>Spec</th> > > <th>Value</th> > > <th>Description</th></tr> > > <tr> > > <td>LTE Band and Mode</td> > > <td><span property="keywords apple:cellphoneBand">4G</span> > > <span property="keywords apple:cellphoneMode">LTE</span></td> > > <td></td></tr> > > <tr> > > <td>Battery Type</td> > > <td property="keywords apple:batteryType">lithium-ion</td> > > <td></td></tr> > > <tr> > > <td><a property="apple:productFeature" > > href="http://apple.com/def/feature/handheld#Built-In%20GPS">Built-In > GPS</a></td> > > <td>Yes</td> > > <td></td></tr> > > <tr> > > <td property="apple:productFeature">Touch Screen</td> > > <td>Yes</td> > > <td></td></tr> > > <tr> > > <td>Operating System</td> > > <td property="operatingSystem">Apple iOS 7</td> > > <td></td></tr> > > <tr> > > <td>Screen Size</td> > > <td><span property="width apple:screenSize" > datatype="unit:Inch">4</span>"</td> > > <td>Size of the screen, in inches, measured diagonally from > corner to corner. > > </td></tr> > > <tr property="keywords"> > > <td>Bluetooth Version</td> > > <td property="apple:bluetoothVersion">4.0</td> > > <td></td></tr> > > <tr> > > <td>Keyboard Type</td> > > <td property="keywords apple:keyboardType">Virtual QWERTY</td> > > <td></td></tr> > > <tr property="hasPart" typeof="Thing pto:Camera"> > > <td><span property="name">Front Facing Camera</span> MP > Rating</td> > > <td property="apple:megaPixelRating">1.2</td> > > <td></td></tr> > > <tr property="hasPart" typeof="Thing pto:Camera"> > > <td><span property="name">Rear Facing Camera</span> MP > Rating</td> > > <td property="apple:megaPixelRating">8</td> > > <td></td></tr> > > </table> > > </div> > > > > If Apple were to use their own properties like this, they can describe > them in a page at <http://apple.com/def/product>, using: > > > > <body vocab="http://schema.org/"> > > <h1 property="name">Apple Product Vocabulary</h1> > > <h2>Properties</h2> > > <article id="batteryType" resource="#batteryType" > typeof="Property"> > > <h3 property="name">Battery Type</h3> > > <p property="description">The type of battery.</p> > > </article> > > ... > > </body> > > My problem with this proposal is that it is, as far as I understand, > RDFa-centric. And I think our approach should be syntax neutral. > It should be syntax neutral. You can use expanded URIs if you do a microdata version of this example (like you did above by using http://acme.org/vocab/#voltage). It may look more palatable in RDFa, but it isn't limited to it. (Except for my use of @datatype. But that was just to show that it is technically possible to do a lot with just plain values. It isn't intrinsic to the use of external properties though, and can be ignored here.) > It would be very valuable to examine what deployment problems this > pattern might have encountered in the past. Perhaps understanding of it has > matured in recent times? Use of embedded data in general, schema.org in > particular, and the pattern of multiple types from different vocabularies > has certainly increased a lot, so I would like to see if this has become > more palatable. It really is quite simple: > > > > 1. Create a page describing your special properties. > > 2. Use these terms within your product pages. > > We have tried to foster a similar pattern in shop applications for > GoodRelations, but this has really not worked well. And this proposal again > mixed the perspective of data consumption and data publication. It is > unnecessary to put the burden of defining a local vocabulary on a Web site, > when the only purpose is to cross-reference this in other parts of the > site. This is something a consuming client can do as well. > I think it is a good example of factoring your data instead of repeatedly marking up embedded labels. And I'm not entirely convinced of the high cost, although it's partly because I think the site-local vocabulary case shouldn't be that common over time. Most useful properties should be found in some vocabulary a bit more shared than that (such as Good Relations or GTIN+). As this haven't been actively promoted for schema.org though, it is untried territory for those not used to linked data in general. Anyway, continuing the topic of external properties/vocabularies could be done independently of this proposal – especially if it is a bit more distanced from a generic property extension notion (e.g. by doing the suggested name changes). > From there, improvements can be made, such as sharing, integrating and > reusing these properties across endeavours (linking them as much as > possible). And of course promoting the most common of these terms for > inclusion in schema.org itself (and again, linking together the "wild" > sources with these new core terms). > > > > In practise, this requires the backers (search engines) of schema.orgto promote and utilize the rich potential here. By collecting valuable > external properties, and eventually enabling the most common of them to be > shown in e.g. rich snippets. It requires effort of course, but since the > terms are more structured than plain text, it doesn't require > disambiguation heuristics, full NLP and such. (Which is the very case for > structured data in pages over raw scraping and powerful text analysis.) > > > > Cheers, > > Niklas > > As far as I can see, my current proposal achieves basically the same with > pretty straightforward markup, available in all syntaxes, and in a way that > allows various degrees of granularity. Site owners will be able to preserve > all granularity (e.g. if value and unit are two field) and data semantics > (e.g. if they can serve a numerical range as min and max or have a public > identifier for a property). > It certainly achieves something with potential. I just want to make it clear that is is complementary to external properties. With the suggested renames, it becomes a way to use structured, named values for (product) specifications, rather than a generic way for mixing plain text properties and values (which IMO confuses the notion of property in this context of structured data). Let us make it as simple as possible for sites to expose rich product data. > That should be our first priority. > It is a good priority. Let's just also make that data readily consumable, and when compromise must take place, to cater for clearly complementary alternatives. Cheers, Niklas > > Martin > >
Received on Saturday, 3 May 2014 00:13:37 UTC