Re: Generic Property-Value Proposal for Schema.org

Hi all,

I do understand the case for capturing structured values describing special
properties of products. The way proposed certainly makes for less
normalized, precise and reusable data than simple properties with direct
values enable. But it has some merit in the fact that, as Martin says, a
little data goes a long way.

The shape of data that this produce merits some analysis. For one, the idea
to some extent resembles a mix of statement reification and structured
values. More interestingly though, the pattern is similar to an effect that
can be achieved by defining very specific SKOS concepts (for e.g. battery
types, operating systems, screen sizes and bluetooth types), and linking to
them with e.g. a productSpecification property. To me, these PropertyValue
entities really look like a free-form version of such concept/topic/enum
entities, with their plain text names representing their "type", rather
than a generic property extension (in the RDF sense). And I see the
potential in that.

(And although such ambiguous data can be hard to collate (and translate),
it has its place, just as plain text keywords for basic website SEO have,
in a primitive fashion. External enumerations (using e.g. SKOS) are often
far more usable and scalable in the long term though.)

That said, I would be *very* cautious of promoting this shape as a property
extension mechanism. That would be done at the expense of using a mix of
vocabularies for specialized data.

My opinion, based on experience in both consuming data and working to unify
disparate descriptions, is that, in the general case of needing specific
properties beyond the core or schema.org, it would be quite valuable to
apply the existing mechanism of mixing vocabularies, native to RDF and the
enabler of decentralized vocabulary growth. It has been there from the
start and proven extremely valuable in specific data integration scenarios.
Of course, it has the downside of enabling, in Richard's words, a
"cacophony of multiple vocabulary choices". But by grounding basic common
terms in schema.org, we have one stable core around which other things can
revolve and evolve.

RDFa has great support for this, especially in compact form through the
means of prefixes. But all of RDFa, JSON-LD and microdata support using
full URIs as property names, so it can be catered for in general. (Also,
JSON-LD has more powerful support through both prefix and direct term
definitions in a context.)

As an example, here is what the example product table in the proposal could
look like, when adding an external vocabulary (also capturing some keywords
and using some external enumerations):

    <div vocab="http://schema.org/"
        prefix="pto: http://www.productontology.org/id/
                unit: http://qudt.org/vocab/unit#
                apple: http://apple.com/def/product#">
      <table typeof="Product pto:IPhone_5">
        <caption>iPhone 5 Specifications</caption>
        <tr>
          <th>Spec</th>
          <th>Value</th>
          <th>Description</th></tr>
        <tr>
          <td>LTE Band and Mode</td>
          <td><span property="keywords apple:cellphoneBand">4G</span>
            <span property="keywords apple:cellphoneMode">LTE</span></td>
          <td></td></tr>
        <tr>
          <td>Battery Type</td>
          <td property="keywords apple:batteryType">lithium-ion</td>
          <td></td></tr>
        <tr>
          <td><a property="apple:productFeature"
              href="http://apple.com/def/feature/handheld#Built-In%20GPS">Built-In
GPS</a></td>
          <td>Yes</td>
          <td></td></tr>
        <tr>
          <td property="apple:productFeature">Touch Screen</td>
          <td>Yes</td>
          <td></td></tr>
        <tr>
          <td>Operating System</td>
          <td property="operatingSystem">Apple iOS 7</td>
          <td></td></tr>
        <tr>
          <td>Screen Size</td>
          <td><span property="width apple:screenSize"
datatype="unit:Inch">4</span>"</td>
          <td>Size of the screen, in inches, measured diagonally from
corner to corner.
        </td></tr>
        <tr property="keywords">
          <td>Bluetooth Version</td>
          <td property="apple:bluetoothVersion">4.0</td>
          <td></td></tr>
        <tr>
          <td>Keyboard Type</td>
          <td property="keywords apple:keyboardType">Virtual QWERTY</td>
          <td></td></tr>
        <tr property="hasPart" typeof="Thing pto:Camera">
          <td><span property="name">Front Facing Camera</span> MP
Rating</td>
          <td property="apple:megaPixelRating">1.2</td>
          <td></td></tr>
        <tr property="hasPart" typeof="Thing pto:Camera">
          <td><span property="name">Rear Facing Camera</span> MP Rating</td>
          <td property="apple:megaPixelRating">8</td>
          <td></td></tr>
      </table>
    </div>

If Apple were to use their own properties like this, they can describe them
in a page at <http://apple.com/def/product>, using:

    <body vocab="http://schema.org/">
      <h1 property="name">Apple Product Vocabulary</h1>
      <h2>Properties</h2>
      <article id="batteryType" resource="#batteryType" typeof="Property">
        <h3 property="name">Battery Type</h3>
        <p property="description">The type of battery.</p>
      </article>
      ...
    </body>

It would be very valuable to examine what deployment problems this pattern
might have encountered in the past. Perhaps understanding of it has matured
in recent times? Use of embedded data in general, schema.org in particular,
and the pattern of multiple types from different vocabularies has certainly
increased a lot, so I would like to see if this has become more palatable.
It really is quite simple:

1. Create a page describing your special properties.
2. Use these terms within your product pages.

>From there, improvements can be made, such as sharing, integrating and
reusing these properties across endeavours (linking them as much as
possible). And of course promoting the most common of these terms for
inclusion in schema.org itself (and again, linking together the "wild"
sources with these new core terms).

In practise, this requires the backers (search engines) of schema.org to
promote and utilize the rich potential here. By collecting valuable
external properties, and eventually enabling the most common of them to be
shown in e.g. rich snippets. It requires effort of course, but since the
terms are more structured than plain text, it doesn't require
disambiguation heuristics, full NLP and such. (Which is the very case for
structured data in pages over raw scraping and powerful text analysis.)

Cheers,
Niklas



On Fri, May 2, 2014 at 7:49 PM, Dan Brickley <danbri@google.com> wrote:

> On 2 May 2014 18:38, Jason Douglas <jasondouglas@google.com> wrote:
> > Fine, but I think there's an aspect of that mechanism that would be a
> shame
> > to drop, which is that it had some semantic scoping.
> >
> > I think it's a bad idea to have a completely generic bailout mechanism
> like
> > this.  However, I have no issue with more localized bailouts for things
> like
> > product specifications or sports statistics that do have common
> > characteristics but a lot of variety and uniqueness.  You at least have
> some
> > hope of being able to do something useful with that data.  Otherwise,
> > there's little value over a bag of words.
>
> Yeah, I share the concern about having unscoped bundles of fields that
> could mean anything.
>
> I'm not a believe in the slash-based extension, at least in this case.
> It's best used for super-properties, i.e. where the extended form
> implies the short form:
>
> Does
> {
>   @type: Product,
>   productSpecification/screenSize : {
>     value: 46
>     unitCode: "CMT"
>   }
> }
>
> imply
>
> { @type: Product,  productSpecification: "46"} ?
>
> This would seem like an overstretch. 46 could be the number of
> previous owners, without the qualifying info.  Whereas
> http://schema.org/actor/lead would 'dumb down' nicely to plain old
> '/actor'.
>
> For the kind of product data Martin's talking about here, I wonder
> whether it might be more fruitful to use something like a CSV tabular
> form, associated as a http://schema.org/Dataset and use annotations on
> the table structure, along lines we're spec'ing in the W3C CSV on the
> Web group - http://www.w3.org/TR/2014/WD-csvw-ucr-20140327/
> https://www.w3.org/2013/csvw/wiki/Main_Page
>
> Dan
>
>

Received on Friday, 2 May 2014 18:11:57 UTC