Re: Generic Property-Value Proposal for Schema.org

Hi Martin,


On Fri, May 2, 2014 at 10:37 PM, martin.hepp@ebusiness-unibw.org <
martin.hepp@ebusiness-unibw.org> wrote:

> Dear Niklas,
>
> On 02 May 2014, at 20:10, Niklas Lindström <lindstream@gmail.com> wrote:
>
> > Hi all,
> >
> > I do understand the case for capturing structured values describing
> special properties of products. The way proposed certainly makes for less
> normalized, precise and reusable data than simple properties with direct
> values enable. But it has some merit in the fact that, as Martin says, a
> little data goes a long way.
>
> Thanks!
> >
> > The shape of data that this produce merits some analysis. For one, the
> idea to some extent resembles a mix of statement reification and structured
> values. More interestingly though, the pattern is similar to an effect that
> can be achieved by defining very specific SKOS concepts (for e.g. battery
> types, operating systems, screen sizes and bluetooth types), and linking to
> them with e.g. a productSpecification property. To me, these PropertyValue
> entities really look like a free-form version of such concept/topic/enum
> entities, with their plain text names representing their "type", rather
> than a generic property extension (in the RDF sense). And I see the
> potential in that.
>
> Yes, one could say that my proposal is similar to "SKOS for properties" ;-)
> But after the long discussion this has triggered, I would like to downplay
> the proposal to the very tangible area of application for product
> properties and places properties. I am personally convinced that the
> pattern is of generic value, for it strikes a balance between preserving
> data structure and data semantics while minimizing the effort for a data
> publisher. But let's separate this aspect.
>

I see some generic value as well. But I think I see it from a slightly
different angle. I find the notion of naming a structured value quite
usable. (And I try not to think too much about the relative merit of that
in relation to using dedicated properties with simple values, or to the
value of datatyped RDF values (available in RDFa and JSON-LD but not in
microdata). In the sense of a SKOS-like pattern, it can evolve on its own.)

What I find problematic is the very generic but somewhat contrived notion
of additionalProperty, in combination with the split nature of the
PropertyValue notion. It feels a bit mixed up. If instead, the structured
value type was called NamedValue, and the property was called e.g.
specification (applicable for products and places, and maybe other things),
it seems more natural to me.

I did a search/replace for all the examples in your proposal, and I find
the results quite intuitive (using Vim:
%s/additionalProperty/specification/g | %s/PropertyValue/NamedValue/g).
What do you think?



> >
> > (And although such ambiguous data can be hard to collate (and
> translate), it has its place, just as plain text keywords for basic website
> SEO have, in a primitive fashion. External enumerations (using e.g. SKOS)
> are often far more usable and scalable in the long term though.)
> >
> Actually, I think that processing the resulting data is less difficult as
> many assume, since from an NLP perspective, the space of possible
> interpretations is much smaller, and you will have a lot of contextual
> information. But the consumption of the data should not be our main concern
> at this point
>

Yes, it's true that the structure limits the complexity of processing the
text (though different languages are always tricky). Also, if we're not
seeking to express implicit properties in the values (in the RDF sense),
but to use structured values with a contextual name, it might be easier to
collate that information (since such values are more self-contained). And
quite possibly easier for publishers to see when it is applicable.


In general, a problem in our discussion has been that the perspective of
> data publication and data consumption have been mixed. Of course we all
> agree that the resulting data is more effort to process than standardized
> properties, compare
>
> Ideal Version: External Property with Qualitative Value
>
> <div itemscope itemtype="http://schema.org/Product">
>   <span itemprop="name">ACME Electric Anvil</span>
> ...
>   Operating Voltage: <div itemprop="http://acme.org/vocab/#voltage"
> itemscope
>        itemtype="http://schema.org/QuantitativeValue">
>       <span itemprop="minValue">100</span>-
>       <span itemprop="maxValue">220</span>
>       <meta itemprop="unitCode" content="VLT" > V
> </div>
>
> with this
>
> Variant 1: Property name instead of URI
>
> <div itemtype="http://schema.org/Product">
>   <span itemprop="name">ACME Electric Anvil</span>
>   <div itemprop="additionalProperty" itemscope itemtype="
> http://schema.org/PropertyValue">
>           <span itemprop="name">Operating Voltage</span>
>           <span itemprop="minValue">100</span>-
>           <span itemprop="maxValue">250</span>
>           <meta itemprop="unitCode" content="VLT"> V
>   </div>
> </div>
>
> or this
>
> Variant 2: Unit as text instead of UN/CEFACT Common Code and range as a
> single field
>
>
> <div itemtype="http://schema.org/Product">
>   <span itemprop="name">ACME Electric Anvil</span>
>   <div itemprop="additionalProperty" itemscope itemtype="
> http://schema.org/PropertyValue">
>           <span itemprop="name">Operating Voltage</span>
>           <span itemprop="value">100-250</span>-
>           <span itemprop="unitText">V</span>
>   </div>
> </div>
>
> or in worst case this:
>
> Variant 3: Range and Unit in a joint field
>
> <div itemtype="http://schema.org/Product">
>   <span itemprop="name">ACME Electric Anvil</span>
>   <div itemprop="additionalProperty" itemscope itemtype="
> http://schema.org/PropertyValue">
>           <span itemprop="name">Operating Voltage</span>
>           <span itemprop="value">100-250 V</span>-
>   </div>
> </div>
>
>
> It is obvious that the version with a dedicated property URI and a proper
> http://schema.org/QuantitativeValue node is easier to process.
>
> But from a data provider's perspective, who typically has the product
> properties in very light-weight property-value structures, with often
> proprietary properties, even the step to Variant 1 makes data publication
> much, much simpler, because he does not have to map the local property name
> to a standard property URI nor determine the type of the value
> (quantitative, qualitative, or Boolean). That is VERY difficult from
> typical Web applications, even if the back-end systems (PDM/PIM) had this
> additional data.
>
> From a data consumer's perspective, however, even the lightest version
>
> <div itemtype="http://schema.org/Product">
>   <span itemprop="name">ACME Electric Anvil</span>
>   <div itemprop="additionalProperty" itemscope itemtype="
> http://schema.org/PropertyValue">
>           <span itemprop="name">Operating Voltage</span>
>           <span itemprop="value">100-250 V</span>
>   </div>
> </div>
>
> is still much easier to consume and lift than
>
> <div itemtype="http://schema.org/Product">
>   <span itemprop="name">ACME Electric Anvil</span>
>   <div>
>           <span>Operating Voltage</span>
>           <span>100-250 V</span>
>   </div>
> </div>
>
> And I expect that most sites could easily reach the level of Variant 1 or
> Variant 2.
>

Yes, I certainly agree that it lowers the barrier for publishers. And I do
agree that you should consider the different perspectives and needs of
production and consumption, to avoid confusing them. But I still think they
need to be considered in conjunction.

Given the renaming suggestion above, a combination of the "ideal" and
Variant 1 doesn't look too strange either:

    <div itemtype="http://schema.org/Product">
        <span itemprop="name">ACME Electric Anvil</span>
        <div itemprop="specification http://acme.org/vocab/#voltage"
            itemscope itemtype="http://schema.org/NamedValue">
            <span itemprop="name">Operating Voltage</span>
            <span itemprop="minValue">100</span>-
            <span itemprop="maxValue">250</span>
            <meta itemprop="unitCode" content="VLT"> V
        </div>
    </div>

My point here is that a notion of a NamedValue isn't so much on a collision
course with external properties as PropertyValue seems to be. In this
fashion, they become more clearly complementary.

In fact, as an addition to the proposed renaming, I'd suggest to leave out
the propertyID at this point. (It makes the proposal simpler. You can
already provide a property with an URI as per above. And for the eclass81
prefix, I doubt that creating *another* prefix mechanism is such a good
idea.)



> > That said, I would be *very* cautious of promoting this shape as a
> property extension mechanism. That would be done at the expense of using a
> mix of vocabularies for specialized data.
>
> As already said, I am perfectly fine with postponing this possibility to
> the future and constraining this to Product and Place for the moment.
> >
> > My opinion, based on experience in both consuming data and working to
> unify disparate descriptions, is that, in the general case of needing
> specific properties beyond the core or schema.org, it would be quite
> valuable to apply the existing mechanism of mixing vocabularies, native to
> RDF and the enabler of decentralized vocabulary growth. It has been there
> from the start and proven extremely valuable in specific data integration
> scenarios. Of course, it has the downside of enabling, in Richard's words,
> a "cacophony of multiple vocabulary choices". But by grounding basic common
> terms in schema.org, we have one stable core around which other things
> can revolve and evolve.
>
> > RDFa has great support for this, especially in compact form through the
> means of prefixes. But all of RDFa, JSON-LD and microdata support using
> full URIs as property names, so it can be catered for in general. (Also,
> JSON-LD has more powerful support through both prefix and direct term
> definitions in a context.)
> >
>
> Yes, I am perfectly fine with this. If site-owners are able to publish
> data according to external product ontologies, like the 40+ we developed
> for GoodRelations, or the new GPC ontology, that is great. And my proposal
> does not aim at stealing this opportunity.
>
> However, note that there are THREE bottlenecks with using external
> vocabularies:
>
> 1. There must be a suitable vocabulary.


Yes. The usual chicken-and-egg problem. :) I think data consumers like
search engines can drive the evolution of that by looking into usage of
external vocabularies though. (Again. I do think the pendulum can slowly
turn. Especially now with a core schema.org to base data shapes upon.)

2. Site-owners must be able to map their local data to the external
> ontologies and publish respective data. In the past ten years, I have been
> able to convince just ONE site to use eClassOWL at broad scale. The problem
> is that the data supply for this is quite challenging.
>

It is definitely an effort to align local data with common models. And even
more so to engage in the evolution of such models. Which we are rather
aware of here. :) But there is great value in doing that work – you will
think more about why the data looks the way it looks, understand your own
domain in the eyes of others, and so on. It is very rewarding. But I know
it's not always easy to convince organizations of this value, and to thus
invest in it.


> 3. Most people I speak to basically say that for their production sites,
> they do only what is specifified in schema.org. Unless the sponsors of
> schema.org explicitly endorse the use of a certain external vocabulary,
> this will not have a big adoption, IMO. Adding the proposed elements to
> schema.org in contrast will make it much easier to convince owners of
> this valuable data to make it available for search engines and other
> clients.
>

Yes, getting some data out is better than nothing. It is still unclear
whether search engines will valuably consume and present what's already in
schema.org though. Other, specialized services might more quickly make use
of specialized properties – and for that, the existing URI mechanism for
external properties is great. (A good example of that is WebID.)


> This is not a technical issue of course, just a signal. But it will matter.
>

Absolutely. We're striving to send signals saying that certain patterns are
or will be widely understood, consumed and built upon. I just want to make
sure that we don't promote something with short-term value at the expense
of long-term values which we already have worked out mechanisms for. Though
as long as were seeking complementary practises to fill out the gaps, all
is good.


> As an example, here is what the example product table in the proposal
> could look like, when adding an external vocabulary (also capturing some
> keywords and using some external enumerations):
> >
> >     <div vocab="http://schema.org/"
> >         prefix="pto: http://www.productontology.org/id/
> >                 unit: http://qudt.org/vocab/unit#
> >                 apple: http://apple.com/def/product#">
> >       <table typeof="Product pto:IPhone_5">
> >         <caption>iPhone 5 Specifications</caption>
> >         <tr>
> >           <th>Spec</th>
> >           <th>Value</th>
> >           <th>Description</th></tr>
> >         <tr>
> >           <td>LTE Band and Mode</td>
> >           <td><span property="keywords apple:cellphoneBand">4G</span>
> >             <span property="keywords apple:cellphoneMode">LTE</span></td>
> >           <td></td></tr>
> >         <tr>
> >           <td>Battery Type</td>
> >           <td property="keywords apple:batteryType">lithium-ion</td>
> >           <td></td></tr>
> >         <tr>
> >           <td><a property="apple:productFeature"
> >               href="http://apple.com/def/feature/handheld#Built-In%20GPS">Built-In
> GPS</a></td>
> >           <td>Yes</td>
> >           <td></td></tr>
> >         <tr>
> >           <td property="apple:productFeature">Touch Screen</td>
> >           <td>Yes</td>
> >           <td></td></tr>
> >         <tr>
> >           <td>Operating System</td>
> >           <td property="operatingSystem">Apple iOS 7</td>
> >           <td></td></tr>
> >         <tr>
> >           <td>Screen Size</td>
> >           <td><span property="width apple:screenSize"
> datatype="unit:Inch">4</span>"</td>
> >           <td>Size of the screen, in inches, measured diagonally from
> corner to corner.
> >         </td></tr>
> >         <tr property="keywords">
> >           <td>Bluetooth Version</td>
> >           <td property="apple:bluetoothVersion">4.0</td>
> >           <td></td></tr>
> >         <tr>
> >           <td>Keyboard Type</td>
> >           <td property="keywords apple:keyboardType">Virtual QWERTY</td>
> >           <td></td></tr>
> >         <tr property="hasPart" typeof="Thing pto:Camera">
> >           <td><span property="name">Front Facing Camera</span> MP
> Rating</td>
> >           <td property="apple:megaPixelRating">1.2</td>
> >           <td></td></tr>
> >         <tr property="hasPart" typeof="Thing pto:Camera">
> >           <td><span property="name">Rear Facing Camera</span> MP
> Rating</td>
> >           <td property="apple:megaPixelRating">8</td>
> >           <td></td></tr>
> >       </table>
> >     </div>
> >
> > If Apple were to use their own properties like this, they can describe
> them in a page at <http://apple.com/def/product>, using:
> >
> >     <body vocab="http://schema.org/">
> >       <h1 property="name">Apple Product Vocabulary</h1>
> >       <h2>Properties</h2>
> >       <article id="batteryType" resource="#batteryType"
> typeof="Property">
> >         <h3 property="name">Battery Type</h3>
> >         <p property="description">The type of battery.</p>
> >       </article>
> >       ...
> >     </body>
>
> My problem with this proposal is that it is, as far as I understand,
> RDFa-centric. And I think our approach should be syntax neutral.
>

It should be syntax neutral. You can use expanded URIs if you do a
microdata version of this example (like you did above by using
http://acme.org/vocab/#voltage). It may look more palatable in RDFa, but it
isn't limited to it. (Except for my use of @datatype. But that was just to
show that it is technically possible to do a lot with just plain values. It
isn't intrinsic to the use of external properties though, and can be
ignored here.)


> It would be very valuable to examine what deployment problems this
> pattern might have encountered in the past. Perhaps understanding of it has
> matured in recent times? Use of embedded data in general, schema.org in
> particular, and the pattern of multiple types from different vocabularies
> has certainly increased a lot, so I would like to see if this has become
> more palatable. It really is quite simple:
> >
> > 1. Create a page describing your special properties.
> > 2. Use these terms within your product pages.
>
> We have tried to foster a similar pattern in shop applications for
> GoodRelations, but this has really not worked well. And this proposal again
> mixed the perspective of data consumption and data publication. It is
> unnecessary to put the burden of defining a local vocabulary on a Web site,
> when the only purpose is to cross-reference this in other parts of the
> site. This is something a consuming client can do as well.
>

I think it is a good example of factoring your data instead of repeatedly
marking up embedded labels. And I'm not entirely convinced of the high
cost, although it's partly because I think the site-local vocabulary case
shouldn't be that common over time. Most useful properties should be found
in some vocabulary a bit more shared than that (such as Good Relations or
GTIN+). As this haven't been actively promoted for schema.org though, it is
untried territory for those not used to linked data in general.

Anyway, continuing the topic of external properties/vocabularies could be
done independently of this proposal – especially if it is a bit more
distanced from a generic property extension notion (e.g. by doing the
suggested name changes).


> From there, improvements can be made, such as sharing, integrating and
> reusing these properties across endeavours (linking them as much as
> possible). And of course promoting the most common of these terms for
> inclusion in schema.org itself (and again, linking together the "wild"
> sources with these new core terms).
> >
> > In practise, this requires the backers (search engines) of schema.orgto promote and utilize the rich potential here. By collecting valuable
> external properties, and eventually enabling the most common of them to be
> shown in e.g. rich snippets. It requires effort of course, but since the
> terms are more structured than plain text, it doesn't require
> disambiguation heuristics, full NLP and such. (Which is the very case for
> structured data in pages over raw scraping and powerful text analysis.)
> >
> > Cheers,
> > Niklas
>
> As far as I can see, my current proposal achieves basically the same with
> pretty straightforward markup, available in all syntaxes, and in a way that
> allows various degrees of granularity. Site owners will be able to preserve
> all granularity (e.g. if value and unit are two field) and data semantics
> (e.g. if they can serve a numerical range as min and max or have a public
> identifier for a property).
>

It certainly achieves something with potential. I just want to make it
clear that is is complementary to external properties. With the suggested
renames, it becomes a way to use structured, named values for (product)
specifications, rather than a generic way for mixing plain text properties
and values (which IMO confuses the notion of property in this context of
structured data).

Let us make it as simple as possible for sites to expose rich product data.
> That should be our first priority.
>

It is a good priority. Let's just also make that data readily consumable,
and when compromise must take place, to cater for clearly complementary
alternatives.

Cheers,
Niklas


>
> Martin
>
>

Received on Saturday, 3 May 2014 00:13:37 UTC