W3C home > Mailing lists > Public > public-html@w3.org > December 2012

Microdata CR Objection Rationale Statement

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Mon, 03 Dec 2012 15:59:30 -0500
Message-ID: <50BD12B2.9000403@digitalbazaar.com>
To: HTML WG <public-html@w3.org>
This is the final Rationale Statement for the Microdata Candidate
Recommendation Objection. Please link to this e-mail from the preference
poll to provide participants some background on the arguments presented
for this objection.

An easier-on-the-eyes version of this e-mail is available here:

http://manu.sporny.org/2012/microdata-cr/

-------------------------------------------------------------------

Objection to Microdata Candidate Recommendation

   Full disclosure: I'm the current chair of the standards group at
   the World Wide Web Consortium that created the newest version of
   RDFa, editor of the HTML5+RDFa 1.1 and RDFa Lite 1.1
   specifications, and I'm also a member of the HTML Working Group.

   The HTML Working Group at the W3C is currently trying to decide if
   they should transition the Microdata specification to the next
   stage in the standardization process. There has been a [1]call for
   consensus to transition the spec to the Candidate Recommendation
   stage. The problem is that we already have a set of specifications
   that are official W3C recommendations that do what Microdata does
   and more. RDFa 1.1 became an official W3C Recommendation last
   summer. From a standards perspective, this is a mistake and sends
   a confused signal to Web developers. Officially supporting two
   specification that do almost exactly the same thing in almost
   exactly the same way is, ultimately, a failure to standardize.

   The fact that RDFa already does what Microdata does has been
   elaborated upon before:

   [2]Mythical Differences: RDFa Lite vs. Microdata
   [3]An Uber-comparison of RDFa, Microdata, and Microformats

   Here's the problem in a nutshell: The W3C is thinking of ratifying
   two completely different specifications that [4]accomplish the
   same thing in basically the same way. The functionality of RDFa,
   which is already a W3C Recommendation, overlaps Microdata by a
   large margin. In fact, RDFa Lite 1.1 was developed as a plug-in
   replacement for Microdata. The full version of RDFa can also do a
   number of things that Microdata cannot, such as datatyping,
   associating more than one type per object, embed-ability in
   languages other than HTML, ability to easily publish and mix
   vocabularies, etc.

   Microdata would have easily been dead in the water had it not been
   for two simple facts: 1) The editor of the specification works at
   Google, and 2) Google pushed Microdata as the markup language for
   schema.org before also accepting RDFa markup. The first enabled
   Google and the editor to work on schema.org without signalling to
   the public that it was creating a competitor to Facebook's Open
   Graph Protocol. The second gave Microdata enough of a jump start
   to establish a foothold for schema.org markup. There have been a
   number of studies that [5]show that Microdata's sole use case (99%
   of Microdata markup) is for the markup of schema.org terms.
   Microdata is not widely used outside of that context, we now have
   data to back up what we had predicted would happen when schema.org
   made their initial announcement for Microdata-only support. Note
   that schema.org now supports both RDFa and Microdata.

   It is typically a bad idea to have two formats published by the
   same organization that do the same thing. It leads to Web
   developer confusion surrounding which format to use. One of the
   goals of Web standards is to reduce, or preferably eliminate, the
   confusion surrounding the correct technology decision to make. The
   HTML Working Group and the W3C is failing miserably on this front.
   There is more confusion today about picking Microdata or RDFa
   because they accomplish the same thing in effectively the same
   way. The only reason both exist is due to political reasons.

   If we step back and look at the technical arguments, there is no
   compelling reason that Microdata should be a W3C Recommendation.
   There is no compelling reason to have two specifications that do
   the same thing in basically the same way. Therefore, as a member
   of the HTML Working Group (not as a chair or editor of RDFa) I
   object to the publication of Microdata as a Candidate
   Recommendation.

   Note that this is not a W3C formal objection. This is an informal
   objection to publish Microdata along the Recommendation track.
   This objection will not become an official W3C formal objection if
   the HTML Working Group holds a poll to gather consensus around
   whether Microdata should proceed along the Recommendation
   publication track. I believe the publication of a W3C Note will
   continue to allow Google to support Microdata in schema.org, but
   will hopefully correct the confused message that the W3C has been
   sending to Web developers regarding RDFa and Microdata. We don't
   need two specifications that do almost exactly the same thing.

   The message sent by the W3C needs to be very clear: There is one
   recommendation for doing structured data markup in HTML. That
   recommendation is RDFa. It addresses all of the use cases that
   have been put forth by the general Web community, and it's ready
   for broad adoption and implementation today.

Summary of Facts and Arguments

   Below is a summary of arguments presented as a basis for
   publishing Microdata along the W3C Note track:

    1. RDFa 1.1 is already a [7]ratified Web standard as of June 7th
       2012 and absorbed almost every Microdata feature before it
       became official. If the majority of the differences between
       RDFa and Microdata boil down to different attribute names
       (property vs. itemprop), then the two solutions have
       effectively converged on syntax and W3C should not ratify two
       solutions that do effectively the same thing in almost exactly
       the same way.
    2. RDFa is [8]supported by all of the major search crawlers,
       including Google (and schema.org), Microsoft, Yahoo!, Yandex,
       and Facebook. Microdata is not supported by Facebook.
    3. RDFa Lite 1.1 is [9]feature-equivalent to Microdata. Over 99%
       of Microdata markup can be expressed easily in RDFa Lite 1.1.
       Converting from Microdata to RDFa Lite is as simple as a
       search and replace of the Microdata attributes with RDFa Lite
       attributes. Conversely, Microdata does not support a number of
       the more advanced RDFa features, like being able to tell the
       difference between feet and meters.
    4. You can [10]mix vocabularies with RDFa Lite 1.1, supporting
       both schema.org and Facebook's Open Graph Protocol (OGP) using
       a single markup language. You don't have to learn Microdata
       for schema.org and RDFa for Facebook - just use RDFa for both.
    5. The [11]creator of the Microdata specification doesn't like
       Microdata. When people are not passionate about the solutions
       that they create, the desire to work on those solutions and
       continue improve upon them is muted. The RDFa community is
       passionate about the technology that they have created
       together and have strived to make it better since the
       standardization of RDFa 1.0 back in 2008.
    6. RDFa Lite 1.1 is [12]fully upward-compatible with RDFa 1.1,
       allowing you to seamlessly migrate to a more feature-rich
       language as your Linked Data needs grow. Microdata does not
       support any of the more advanced features provided by RDFa
       1.1.
    7. RDFa [13]deployment is broader than Microdata. RDFa deployment
       continues to grow at a rapid pace.
    8. The economic damage generated by publishing both RDFa and
       Microdata along the Recommendation track should not be
       underestimated. W3C should try to provide clear direction in
       an attempt to reduce the economic waste that a "let the market
       sort it out among two nearly identical solutions" strategy
       will generate. At some point, the market will figure out that
       both solutions are nearly identical, but only after publishing
       and building massive amounts of content and tooling for both.
    9. The W3C Technical Architecture Group (TAG), which is
       responsible for ensuring that the core architecture of the Web
       is sound, has [14]raised their concern about the publication
       of both Microdata and RDFa as recommendations. After the W3C
       TAG raised their concerns, the RDFa Working Group created RDFa
       Lite 1.1 to be a near feature-equivalent replacement for
       Microdata that was also backwards-compatible with RDFa 1.0.
   10. Publishing a standard that does almost exactly the same thing
       as an existing standard in almost exactly the same way is a
       [15]failure to standardize.

Counter-arguments and Rebuttals

     [This is a] [16]classic case of monopolistic anti-competitive
     protectionism.

   No, this is an objection to publishing two specifications that do
   almost exactly the same thing in almost exactly the same way along
   the W3C Recommendation publication track. Protectionism would have
   asked that all work on Microdata be stopped and the work scuttled.
   The proposed resolution does not block anybody from using
   Microdata, nor does it try to stop or block the Microdata work
   from happening in the HTML WG. The objection asks that the W3C
   decide what the best path forward for Web developers is based on a
   fairly complicated set of predicted outcomes. This is not an easy
   decision. The objection is intended to ensure that the HTML
   Working Group has this discussion before we proceed to Candidate
   Recommendation with Microdata.

      <manu1> I'd like the W3C to work as well, and I think publishing
              two specs that accomplish basically the same thing in
              basically the same way shows breakage.
      <annevk> Bit late for that. XDM vs DOM, XPath vs Selectors,
               XSL-FO vs CSS, XSLT vs XQuery, XQuery vs XQueryX,
               RDF/XML vs Turtle, XForms vs Web Forms 2.0,
               XHTML 1.0 vs HTML 4.01,
               XML 1.0 4th Edition vs XML 1.0 5th Edition,
               XML 1.0 vs XML 1.1, etc.

      link to full conversation[17]

   While W3C does have a history of publishing competing
   specifications, there have been features in each competing
   specification that were compelling enough to warrant the
   publication of both standards. For example, XHTML 1.0 provided a
   standard set of rules for validating documents that was aligned
   with XML and a decentralized extension mechanism that HTML4.01 did
   not. Those two major features were viewed as compelling enough to
   publish both specifications as Recommendations via W3C.

   For authors, the differences between RDFa and Microdata are so
   small that, for 99% of documents in the wild, you can convert a
   Microdata document to an RDFa Lite 1.1 document with a simple
   search and replace of attribute names. That demonstrates that the
   syntaxes for both languages are different only in the names of the
   HTML attributes, and that does not seem like a very compelling
   reason to publish both specifications as Recommendations.

     [18]Microdata's processing algorithm is vastly simpler, which
     makes the data extracted more reliable and, when something does go
     wrong, makes it easier for 1) users to debug their own data, and
     2) easier for me to debug it if they can't figure it out on their
     own.

   Microdata's processing algorithm is simpler for two major reasons:

     * [19]Microdata does not support as many features and use cases
       as RDFa does.
     * RDFa 1.1 is backwards-compatible with RDFa 1.0, which
       complicates the processing rules. The same is true for HTML5.

   The complexity of implementing a processor has little bearing on
   how easy it is for developers to author documents. For example,
   XHTML 1.0 had a simpler processing model which made the data that
   was extracted more reliable and when something went wrong, it was
   easier to debug. However, HTML5 supported more use cases and
   recovers from errors in cases where it can, which made it more
   popular with Web developers in the long-run.

   Additionally, authors of Microdata and RDFa [20]should be using
   tools like RDFa Play to debug their markup. This is true for any
   Web technology. We debug our HTML, JavaScript, and CSS by loading
   it into a browser and bringing up the debugging tools. This is no
   different for Microdata and RDFa. If you want to make sure your
   markup does what you want, make sure to verify it by using a tool
   and not by trying to memorize the processing rules and running
   them through your head.

     For what it is worth, I personally think [21]RDFa is generally
     a technically better solution. But as Marcos says, "so what"?
     Our job at W3C is to make standards for the technology the
     market decides to use.

   If we think one of these technologies is a technically better
   solution than the other one, we should signal that realization at
   some level. The most basic thing we could do is to make one an
   official Recommendation, and the other a Note. I also agree that
   our job at W3C is to make standards that the technology market
   decides to use, but clearly this particular case isn't that
   cut-and-dried. Schema.org's only option in the beginning was to
   use Microdata, and since authors didn't want to risk not showing
   up in the search engines, they used Microdata. This forced the
   market to go in one direction.

   This discussion would be in a different place had Google kept the
   playing field level. That is not to say that Google didn't have
   good reasons for making the decisions that they did at the time,
   but those reasons influenced the development of RDFa, and RDFa
   Lite 1.1 was the result. The differences between Microdata and
   RDFa have been removed and a new question is in front of us: given
   two almost identical technologies, should the W3C publish two
   specifications that do almost exactly the same thing in almost
   exactly the same way?

     ... the [HTML] Working Group explicitly [22]decided not to pick
     a winner between HTML Microdata and HTML+RDFa

   The question before the HTML WG at the time was whether or not to
   split Microdata out of the HTML5 specification. The HTML Working
   Group did not discuss whether the publishing track for the
   Microdata document should be the W3C Note track or the W3C
   Recommendation track. At the time the decision was made, RDFa Lite
   1.1 did not exist, RDFa Lite 1.1 was not a W3C Recommendation, nor
   did the RDFa and Microdata functionality so greatly overlap as
   they do now. Additionally, the HTML WG decision at that time
   states the following under the "Revisiting the issue" section:

   "If Microdata and RDFa converge in syntax..."

   Microdata and RDFa have effectively converged in syntax. Since
   Microdata can be interpreted as RDFa based on a simple
   search-and-replace of attributes that the languages have
   effectively converged on syntax except for the attribute names.
   The proposal is not to have work on Microdata stopped. Let work on
   Microdata proceed in this group, but let it proceed on the W3C
   Note publication track.

Closing Statements

   I felt uneasy raising this issue because it's a touchy and painful
   subject for everyone involved. Even if the discussion is painful,
   it is a healthy one for a standardization body to have from time
   to time. What I wanted was for the HTML Working Group to have this
   discussion. If the upcoming poll finds that the consensus of the
   HTML Working Group is to continue with the Microdata specification
   along the Recommendation track, I will not pursue a W3C Formal
   Objection. I will respect whatever decision the HTML Working Group
   makes as I trust the Chairs of that group, the process that
   they've put in place, and the aggregate opinion of the members in
   that group. After all, that is how the standardization process is
   supposed to work and I'm thankful to be a part of it.

References

   1. http://lists.w3.org/Archives/Public/public-html/2012Nov/0128.html
   2. http://manu.sporny.org/2012/mythical-differences/
   3. http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/
   4. http://xkcd.com/927/
   5. http://webdatacommons.org/vocabulary-usage-analysis/index.html
   6. mailto:public-html-comments@w3.org
   7. http://www.w3.org/TR/rdfa-core/
   8. http://blog.schema.org/2012/06/semtech-rdfa-microdata-and-more.html
   9. file://localhost/tmp/mdobjection.html
  10. http://www.w3.org/TR/rdfa-primer/#using-multiple-vocabularies
  11. http://krijnhoetmer.nl/irc-logs/whatwg/20121128#l-1122
  12. http://www.w3.org/TR/rdfa-lite/#the-attributes
  13. http://events.linkeddata.org/ldow2012/papers/ldow2012-inv-paper-1.pdf
  14.
http://lists.w3.org/Archives/Public/public-html-comments/2011Jun/0038.html
  15. http://lists.w3.org/Archives/Public/public-html/2012Nov/0180.html
  16. http://lists.w3.org/Archives/Public/public-html/2012Nov/0178.html
  17. http://krijnhoetmer.nl/irc-logs/whatwg/20121128#l-789
  18. http://lists.w3.org/Archives/Public/public-html/2012Nov/0243.html
  19. http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/
  20. http://rdfa.info/play/
  21. http://lists.w3.org/Archives/Public/public-html/2012Nov/0179.html
  22. http://lists.w3.org/Archives/Public/public-html/2012Nov/0186.html

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: HTML5 and RDFa 1.1
http://manu.sporny.org/2012/html5-and-rdfa/
Received on Monday, 3 December 2012 21:00:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 3 December 2012 21:00:03 GMT