Working Group Decision on ISSUE-76 Microdata/RDFa (DRAFT)

Question before the Working Group =

Currently, the HTML5 Draft incorporates Microdata, a syntax for metadata annotation in HTML content that can map to an RDF data model. There is a separate draft specification for HTML+RDFa, a different syntax for embedding metadata in HTML that can map to an RDF data model, based on the earlier RDFa in XHTML, a W3C Recommendation. Proponents have argued for the approaches having various strengths and weaknesses. Though they are not identical in scope and approach, it is clear that these technologies serve many of the same use cases, and are seen by many as competing.

Concerned about the perceived conflict, some HTML Working Members raised ISSUE-76 RDFa/Microdata. The Chairs solicited Change Proposals and Counter-Proposals, and two concrete proposals have been submitted:

The question before the Working Group is which of these Change Proposals to adopt, based on which will draw the weaker objections.

Short Summary of Arguments

See the Review of Arguments Presented below for a full, detailed discussion of the arguments relating to this question.

The benefits of integration on the one hand and modularity on the other were raised. There was debate over whether Microdata is an intrinsic part of the HTML language, or a separable add-on. The technical merits of both RDFa and Microdata were touted. It was suggested that if Microdata were in a separate spec, it would be more easily reusable in other markup languages, though the value and practicality of this was questioned. Advocates of splitting cited the relative lack of maturity and lack of market success (at least so far) of Microdata, though the relevance of this was questioned. Many respondents thought it was wrong for the HTML Working Group to try to pick a winner between RDFa and Microdata, while others thought it was our responsibility to do so.

Many of the objections balance out. But, in some areas, keeping Microdata would draw stronger objections than splitting it. The objections based on maturity, market success, and reusability in other languages are stronger than their respective counterpoints. In light of these other arguments, the objections to picking a winner in this case are stronger than the objections to not doing so. The objections to picking either RDFa or Microdata as a winner were stronger, on the whole, than the objections to letting them compete on an equal footing.

Decision of the Working Group

Therefore, the HTML Working Group hereby adopts the Change Proposal to Separate Microdata from HTML5 Specification. Of the two Change Proposals before us, this one has drawn the weaker objections.

Per this decision and the decision policy[a], bug 7542[b] has been reopened and tagged WGDecision, for action by the editor to implement the cited Change Proposal.

[a] http://dev.w3.org/html5/decision-policy/decision-policy.html [b] http://www.w3.org/Bugs/Public/show_bug.cgi?id=7542

Next Steps

Once the required spec changes have been made and reviewed by the Chairs, the bug and tracker issue can be closed, resolving this issue.

The Change Proposal does not explicitly require publishing a Working Draft of a separated Microdata spec, but it does allow for and encourage the possibility. If anyone steps forward with a separate Microdata draft and wishes to take it to First Public Working Draft, it will be considered for publication per our usual approach.

Appealing this Decision

If anyone feels they have not received due process, or that their concerns have not being duly considered in the course of reaching this decision, they may make their concerns known to the Team Contact (Mike Smith) who will notify the Director.

If anyone strongly disagrees with the content of the decision and would like to raise a Formal Objection, they may do so at this time. Formal Objections are reviewed by the Director in consultation with the Team. Ordinarily, Formal Objections are only reviewed as part of a transition request. In the case of this issue, the Chairs have agreed to forward Formal Objections right away for expedited processing.

Anyone who would like to take either of these steps should first read the Review of Arguments Presented, below.

Revisiting this Issue

This issue can be reopened if new information come up. Examples of possible relevant new information include:

Review of Arguments Presented

Many arguments and objections were presented the Change Proposals, email discussion, and the poll. They are collected and summarized here. All of the arguments and objections mentioned in the Change Proposals and the poll should be covered. However, poll responses or parts of poll responses that do not state an objection have been ignored.

One major point of discussion was the relative benefits of integration and modularity. One argument presented was that "All good specs which integrate with HTML5 should, ideally, be a part of HTML5." Other Working Group members disagreed. They pointed out that this appears to contradict our position that HTML5 enables extension specs - are we saying that nay such spec is by definition not good? Another way of framing the point was in terms of Conway's Law - that by splitting the spec we will make technology reflect organization, and thus weaken integration and lead to the specs using workarounds to work together. Relatedly, it was mentioned that Microdata might in theory even be moved to another Working Group. But it was pointed out that, as applied to Microdata, the Conway's Law and separate WG arguments are purely hypothetical - it's demonstrably possible to split Microdata in its current form without any technical changes, to keep it in the same working group, and indeed even to keep the same editor as with other HTML5 spinoff specs. Indeed, some argued that Microdata may already be affected by Conway's law, due to being part of the spec part of the spec. For example, it doesn't work with content from other namespaces such as SVG or MathML. Another response was that separate specifications which are reviewed and maintained by the HTML WG can be an equally good or even better approach.

Some argued that having the Microdata specification separate from the HTML5 specification will allow the technologies to evolve independently from HTML5. But others pointed out that this could actually be a problem - Microdata and HTML5 being published separately may leave them out of sync. Another specific point raised was that a smaller core document facilitates better review of the parts that are truly essential to review. But the case was also made that inclusiveness promotes greater attention to each part, and that for potentially split sections, being part of the main standard will attract more review attention. A number of WG participants argued in general terms that the spec was "bloated" or "large enough", and that it was good to split anything that should be split. The principle of orthogonality was cited. But other participants pointed out that modularity is not always good. Sometimes it makes a technology more general at the expense of focus. As a middle ground, though, creating a separate spec with the sole or primary aim of use with another spec can still be provide some of the benefits of both.

On the whole, these lines of argument seemed balanced against each other and therefore inconclusive. It is not clear that, as a general principle, either maximum modularity or maximum integration would have consensus or enjoy the weakest objections. Since modularity and integration arguments can't decide arguments cannot decide the matter by themselves, we must proceed to the merits and specific circumstances of the feature under consideration.

Another point raised was the idea that Microdata is an intrinsic "part of the language", the same as any other extension mechanism in HTML5, such as @class, @id, @title, etc. This line of argument makes the case that it doesn't make sense to split out Microdata but not other features, because it's just as much part of the language. But other WG members argued that Microdata is relatively orthogonal and separable - while it has dependencies on other parts of HTML, other parts of HTML mostly don't depend on Microdata. Some went so far as to call it circular reasoning to argue that Microdata should be part of HTML because Microdata is part of HTML. It seems that the matter of what is or is not an intrinsic part of the language is partly subjective and perhaps allowing of shades of grey. It does not seem that this point pushes clearly one way or the other.

Some poll participants argued that Microdata is out of charter, at least as Rec-track work. The argument goes that the charter doesn't say the working group is allowed to actually add additional vocabularies, only to develop an extensibility mechanism. However, this does not seem to be well-founded in the charter. Even though the charter gives RDFa as an example of a vocabulary that could be added via an extensibility mechanism, RDFa is also an extensibility mechanism itself, a way of adding vocabularies, and the charter does not rule out adding RDFa, or something similar, directly. Thus, the charter does not appear to rule out working on Microdata or HTML+RDFa entirely, whether in the main spec or in a separate draft.

There has been considerable discussion about the comparative technical merits of RDFa and Microdata. Microdata advocates argue that Microdata has many technical advantages. It is simple for common cases and only complicated in rare in complicated cases. It has defned conversation to XML and JSON, has a DOM API, and has various other good properties. It avoids the potential confusion of CURIEs and namespaces. The Microformats community has shown that there is a demand for embedding machine reasonable metadata in HTML, but that many of the built-in extensions are lacking. Microdata, it is said, can fill this gap. RDFA advocates concede that Microdata has significant strengths, but counter that many of the advantages of Microdata can and will be replicated in RDFa, in RDFa 1.1. Some also deny the importance of some of Microdata's features, for instance they may argue that namespace prefixes are not in fact confusing. Conversely, RDFa advocates argue that RDFa has some important technical advantages. RDFa is more complete, and in some cases Microdata may be *too* simple. For example it lacks multi-language support. RDFa support the follow-your-nose principle and semantic object validation and has various other advantages. And it's reported that much of the community that provided the use cases driving Microdata is not satisfied with the result, and prefers RDFa. Overall Microdata proponents say that Microdata is valuable because it is mostly inspired by existing technologies but overcomes their flaws. Conversely, RDFa proponents say that Microdata isn't exciting because its main feature is not being RDFa, and RDFa is a superior technology because it is well-established. It seems a case can be made for technical advantages for each technology. There is not consensus to declare either as unquestionably superior, and declaring either to be clearly inferior would draw strong objections.

It's been pointed out that if Microdata was published as a separate spec, it might be reusable in other markup languages, it could be used in other markup languages to provide semantic markup. The likelihood that it would be adopted by other markup languages (like SVG, ODF, or Docbook) might increase because it will no longer be viewed as an HTML5-only technology. Some respondents counter that Microdata is not reusable in non-HTML languages in its current form, limiting the utility of a split out spec. They also argue that being HTML-specific is good; it can be focused enough to work well for one particular domain. Making it more general might reduce its value for HTML. However, it was pointed out that being HTML-specific does not even make Microdata fully applicable to HTML5 documents, which can also include SVG and MathML. Also, the partial reliance on HTML5-specific features, such as <time> or <meta>, does not preclude making the more generally applicable constructs usable for other languages. It seems plausible that a split-out Microdata could continue to have first-rate HTML integration but also be usable from other languages. This possibility seems like an advantage for Microdata in a separate spec.

Another important question was whether Microdata is mature enough. It's been argued that HTML+Microdata should be allowed to become a mature draft before consensus on inclusion or dismissal is discussed. A productive way to enable that maturation process is to separate the work into a separate document. Advocates of keeping Microdata in argue that it's not currently in a state of flux, and if necessary, Microdata can be removed at any time. They point out that HTML5 has historically not followed the model of keeping sections separate if they are not sufficiently mature. And indeed, Microdata is arguably relatively mature compared to other parts of the spec, including some parts that are not controversial for inclusion. The counterpoint is that true maturity would include implementation experience, extensive feedback based on authoring and deployment, and relative lack of strong disagreement. But at this time Microdata has low adoption and has not seen significant adoption among authors, or much implementation in UAs ordata mining tools. Advocates for splitting Microdata point out that while it may turn out to be the best solution, it is currently unproven. And if it turns out to cause problems down the road, then it would be unfortunate to have the HTML5 spec saddled with it. It seems debatable whether Microdata is one of the least mature things in the spec. But it does seem clear that it does not have the maturity level of features inherited from HTML4, or even newer but widely implemented functionality such as <canvas> or <video>. Being unproven is not necessarily a strong objection to inclusion by itself, but it's worth considering along with other factors.

Many WG members have discussed the relative marketplace success of Microdata and RDFa. As has been pointed out, RDFa has significant deployment success - data mining tools from Google and Yahoo use it, it is used by Drupal, and it is published by such organizations as the UK government, the Library of Science and Best Buy. On the other hand, Microdata has no significant deployed history or implementation yet. Thus, Microdata may fail in the marketplace. If Microdata fails in the marketplace, in the long-term, it may be advisable to allow it to fail without having a negative impact on the HTML5 spec proper. Advocates for keeping Microdata argue that while it is true that either RDFa or Microdata (or both) may fail in the marketplace, we should as a working group give the most support to the technology we most believe should succeed in the marketplace. Whether we should be picking winners in this way is controversial.

So perhaps one of the most important points in this discussion is the next one: should the W3C pick a winner in this particular competition, or let nature take its course? One line of reasoning goes: it is the W3C's mission is to advance the Web, therefore the W3C *should* take sides in technological disputes and pick winners. Some argue that Microdata should be picked over RDFa. Others argue that RDFa should be picked over Microdata. Still others say that a winner is not yet clear, and we don't have consensus, so we shouldn't pick a winner. They do not want to see a particular format locked in, and would like to see them set up on an even footing. Some argue that competition between RDFa and Microdata is not a big deal; RDFa is not so widely adopted yet, and writing a parser for both is not so hard. Thus, we shouldn't worry about whether the formats are left to compete on an even footing. Others do think competition is a problem, to the point that having both Microdata and RDFa as W3C specifications is not in the best interests of the web community at large. It was claimed that it's more productive for philosophically divergent communities (RDFa/Microdata) within a larger community (HTML WG) to have their own work products during a period of active debate. But on the other hand, Web Forms (relative to XForms) was cited as a possible counter-example, where we actually took Web Forms 2 from a separate spec to part of the draft. But others thought this was a bad example, because they felt XForms was actually technically superior. The desire to let competition to take its course was phrased in many different ways. One participant said it's beyond our charter to pick winners on technologies that are tangential to our work. Another simply said, "There's more than one way to do it." Another argued that "We have two constituencies with valid use cases for their chosen metadata implementation, as well as limits to their own expressivity." And the Team also felt that independence and user choice was preferred at this time. It was felt that breadth of consensus in the social fabric that includes standards developers, programmers, users, corporate interests and more must be demonstrated to pick a winner. Considering all this input, it seems that picking either RDFa or Microdata as a winner would draw stronger objections than allowing them to compete on an equal footing.

Thus, we see that, while many of the objections balance out, in some areas keeping Microdata would draw stronger objections than splitting it. The objections based on maturity, market success, and reusability in other languages are stronger than their respective counterpoints. As a result, the objections to picking a winner in this case are stronger than the objections to not doing so.