Re: RDF 1.1 Lite Issue # 2: property vs rel

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Sat, 22 Oct 2011 20:28:36 -0400
To: Guha <guha@google.com>
CC: Manu Sporny <msporny@digitalbazaar.com>, W3C Vocabularies <public-vocabs@w3.org>, HTML Data Task Force WG <public-html-data-tf@w3.org>
Message-ID: <728A2958-8271-4B32-A86D-0AE6BD07A803@greggkellogg.net>
I'm CCing the HTML Data TF, as that's really the appropriate location for syntax discussions.

I'm wondering if we could consider an HTML-specific approach to unifying @property and @rel in HTML+RDFa 1.1 Lite. Within that spec, we could define some additional processing rules for @property to create URIs, much (exactly) as MicroData does. This would not replace @rel, but the eventual RDFa 1.1 Lite specification/note would recommend that @rel NOT be used for markup for these use cases. Of course, if it were, the normal RDFa Core 1.1 rules would apply.

Microdata describes how to retrieve a property value [1]. The Microdata to RDF draft uses essentially the same, but with some additional detail for creating RDF [2]. Basically, elements that have an attribute with a URI content model use that value to create a URI Reference, and otherwise a literal is generated.

* audio, embed, iframe, img, source, track, and video use the value of a @src attribute as a URI ref.
* a, area and link use the value of an @href attribute as a URI ref.
* object uses the value of a @data attribute as a URI ref.
* I also added rules for blockquote and q to use the value of a @cite attribute as a URI reference, but this is not in the HTML Microdata spec.
* Otherwise, @property acts as now defined in RDFa Core 1.1.

Alternatively, we could say that @href, @src, @object and @cite take higher precedence and result in a URI ref from any element.

Additionally, to allow for chaining, if @property were used in an element that also had either or both @about or @typeof, it would become a reference (URI or BNode) to a new object defined at that element scope, identically to @itemprop in the same element as @itemscope.

HTML+RDFa 1.1 Lite could then adopt these same rules, so for example, a schema:Event might be marked up as follows:

<div vocab="http://schema.org/" typeof="Event">
  <a property="url" href="nba-miami-philidelphia-game3.html">
  NBA Eastern Conference First Round Playoff Tickets:
  Miami Heat at Philadelphia 76ers - Game 3 (Home Game 1)

  <span property="startDate">2011-04-21T20:00</span>

  <div property="Location" typeof="Place">
    <a property="url" href="wells-fargo-center.html">
    Wells Fargo Center
    <div property="address" typeof="PostalAddress">
      <span property="addressLocality">Philadelphia</span>,
      <span property="addressRegion">PA</span>

  <div property="offers" typeof="AggregateOffer">
    Priced from: <span property="lowPrice">$35</span>
    <span property="offerCount">1,938</span> tickets left

(Note, we miss some formatting with startDate, but if we were also to adopt @datetime processing rules, or whatever the HTML WG replaces it with, this could be handled better as well).

There are probably some corner cases that would need to be worked out, but by limiting this to the HTML+RDFa definition, we avoid backwards compatibility issues with RDFa 1.0 and get that much closer


[1] http://www.w3.org/TR/2011/WD-microdata-20110525/#values
[2] https://dvcs.w3.org/hg/htmldata/raw-file/24af1cde0da1/microdata-rdf/index.html#algorithm-terms

On Oct 22, 2011, at 2:36 PM, Guha wrote:

On Sat, Oct 22, 2011 at 1:49 PM, Manu Sporny <msporny@digitalbazaar.com<mailto:msporny@digitalbazaar.com>> wrote:
On 10/22/2011 01:38 PM, Guha wrote:
Google announced supported RDFa in 2009. One of the startling
discoveries we made was that the error rate (i.e., webmasters marking
up their pages to say X when the really meant to say Y) was about 3
times as much as it was for other formats (which include
microformats, sitemaps, Google shopping feeds, etc.). The error rate
 is/was so bad that we had resort to highly non-scalable techniques
like having humans look at the markup on each site to make sure it
said what the page said. More than 40% of the errors had to do with
the confusion between rel and property.

That is startling. Could you please publish the data and analysis
publicly so that those on this list may look at it and analyze it? We have a couple of approaches that we've discussed over the past 6+ years that could be applied if we knew exactly /how/ people were getting the markup wrong.

We will look into sharing what we can. We have on a number of occasions shared aggregate data. It is not clear we are in a position to share detailed information about other people's websites. You are of course welcome to do the analysis yourself.

I will also note that this particular data was never brought to the
attention of the RDFa Working Group. When did you know about these
errors? Why did you not share the data when you came across it? I ask
because it would've impacted the design of RDFa 1.1 if you had shared
this data with us at the time.

Manu, I think you are missing something here. We have communicated this information, many times, in one-one meetings with Ben Adida and others as we were working on developing microdata. At the end of the day, it was negligence on the part of the folks designing RDFa 1.1 to not actively seek input from some biggest consumers of RDFa.

It is important to note that this data is from a very large sample
(10s of millions of pages) taken from Schema.org<http://Schema.org>'s target audience:
webmasters of sites that are by and large not about technical stuff.

A list of URLs would be great along with a technical analysis of all of
those URLs. Specifically, the following data would be very helpful:

Google DOES NOT provide lists of URLs to anyone. You are welcome to go crawl the web.

* How frequent was the use of @rel vs. the use of @property?

* When @rel was used, was it used in chaining or was it used to
 simply refer to an external resource?

We don't recommend chaining. Almost no one producing markup with rich snippets uses external resources.

* In the Microformats and Creative Commons cases
 (rel="license", rel="tag", etc.) did people get @rel wrong?

You should ask them.

* How frequently does @rel and @property exist on the same element?

In the vocabulary we specified, never.

* How frequently is @property used when @rel should have been used

Don't have the numbers, but it was pretty random. You have to understand that at anything more than a few percent error rate, the data becomes largely unusable in scale.

* How frequently is @rel used when @property should have been used

I will look into doing this analysis, but am not sure when we will be able to get around to this.

Answering these questions will help us understand how the spec should change.

We really don't want to get into whether there is a distinction
between rel and property at a theoretical level.

Who is "we" in this case? The RDFa WG does not want to get into a theoretical debate either. We care about authors easily generating good, valid data.

We = Google, Schema.org<http://Schema.org>.

But the bottom line remains that as long as
the error rate in RDFa usage does not go down dramatically, it is not
a viable option for us.

Who is "us" in this case?

Us = Google, Schema.org<http://Schema.org>

The current proposal takes a step in the
right direction, but several big issues, like the removal of the
distinction between rel and property still need to be addressed.

Could you please detail every one of those "big issues"?

We are doing it. Jason brought up the other issue.

-- manu

Manu Sporny (skype: msporny, twitter: manusporny)
Founder/CEO - Digital Bazaar, Inc.
blog: Standardizing Payment Links - Why Online Tipping has Failed
