- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Wed, 09 Dec 2009 15:16:11 -0500
- To: HTMLWG WG <public-html@w3.org>
The following document outlines a Change Proposal to remove Microdata from the HTML5 specification. The first draft of this document was published on October 21st, 2009, due to a request by one of the Chairs of the HTML WG. This e-mail is formatted to conform to step 2.b of the Escalation Process section[1] of the HTML Working Group Decision Policy document. Changes in Draft 4 ------------------ * Removed automatic publication of HTML+Microdata FPWD requirement Changes in Draft 3 ------------------ * Added statements noting that the publication of HTML+Microdata FPWD is an immediate result of accepting this change proposal in "Summary" and "Proposal Details" * Added more items to "Microdata Cons" list in the "Rebuttal to Counter-Proposal" section Changes in Draft 2 ------------------ * Removed mention of RDFa except when used to convey development/deployment experiences or examples of "maturity" or "minimum/adequate support". * Added Benefit of modularizing Microdata - adoption by languages other than HTML5 * Included Rebuttal to Counter-Proposal section Summary ------- There are currently two mechanisms under active development by the HTML WG for embedding machine-readable semantics in HTML5 - RDFa and Microdata. The HTML+RDFa spec was published in a separate document, as a specification built on top of HTML5. The Microdata spec was published inside of the HTML5 specification, while the discussion of whether or not to include RDFa was still taking place. While there are many points to be made for and against RDFa and Microdata as technologies, the rationale for this proposal is not concerned with those arguments. The pros-and-cons are, however, reviewed at the bottom of the document in the "Rebuttal to Counter-Proposal" section. This change proposal is concerned with the ramifications of placing a new technology that has not gained broad deployment experience nor authoring feedback into the main HTML5 specification. Primarily, this Change Proposal asserts that RDFa barely meets the requirement of broad deployment experience and authoring feedback. Microdata, having achieved very little implementation experience, no deployment experience and very little authoring feedback (to date) should be considered to be an at-risk feature for HTML5 and should be considered for removal into a separate specification. This proposal argues that Microdata should be kept separate from the HTML specification until it is clear to this Working Group that it has become broadly deployed and heavily utilized by the HTML authoring community. This has a number of benefits in the case that the technology succeeds, as well as in the case where the technology fails. Separating Microdata from the HTML5 specification has several significant advantages and no significant disadvantages. Rationale --------- There are a number of basic premises related to separating Microdata from the main HTML5 specification: * Microdata may fail in the marketplace. * It is more productive for philosophically divergent communities (RDFa/Microdata) within a larger community (HTML WG) to have their own work products during a period of active debate. Those complete work products should only be presented to the larger group for consensus when they reach maturity. Doing so prior to the work being completed, leads to perma-thread discussions, as we have experienced for the past several months. * HTML+Microdata should be allowed to become a mature draft before consensus on inclusion or dismissal is discussed in order to ensure the proper technology is selected for semantic data markup. * Having the Microdata specification separate from the HTML5 specification will allow the technologies to evolve independently from HTML5 (during LC, and after REC). * Microdata could be used in other markup languages to provide semantic markup. A number of potential conclusions can be drawn from the premises and current state of affairs: * If Microdata fails in the marketplace, in the long-term, it would be advisable to allow it to fail without having a negative impact on the HTML5 spec proper. Removing it from HTML5, many years from now, will be difficult, if not impossible. * The HTML+Microdata draft should be allowed to mature until W3C Last Call before the discussion on whether or not to include it in the HTML5 specification. A productive way to enable that maturation process is to separate the work into a separate document. * If we don't separate the Microdata specification into a different work product, the alternative may be to prematurely select a single technology, before Microdata is allowed to mature and gather implementation and deployment feedback. * If Microdata is split into a modular Microdata specification, the likely-hood that it would be adopted by other markup languages (like SVG, ODF, or Docbook) might increase because it will no longer be viewed as an HTML5-only technology. Proposal Details ---------------- The change details of this proposal would require removing all language discussing Microdata from the HTML5 specification. It is advised, but not required, that the language be placed into a separate specification. It has been proven that the Microdata specification can be cleanly migrated into a separate HTML+Microdata specification: http://html5.digitalbazaar.com/specs/microdata.html The work to remove the Microdata language from the body of the HTML5 specification took roughly 8-10 hours for a single person to perform. Impact ------ Negative Effects * May produce less interest in and feedback on Microdata since it will not be in the HTML5 spec proper. * All Microdata attributes and behavior would be defined in a separate specification. Positive Effects * Addresses the months-long Microdata vs. RDFa debate by employing the "allowing many flowers to bloom" strategy instead of the "Mad Max" "two enter, one leaves" fight-to-the-death strategy that has been driving the debate. * Demonstrates that multiple specs are capable of being layered on top of HTML5. Proving that HTML5 can be extended through this Working Group is an important milestone for other spec writers as well as W3C member companies. * Allows Microdata to organically mature at its own pace, largely independently from HTML5. * Allows Microdata to fail without affecting the main HTML5 specification. * Changing Microdata in the future wouldn't require the HTML5 specification to be republished as a REC (a very costly process). * Frees the WHATWG and HTMLWG to concentrate on making technical progress in other areas. Rebuttal to Counter-Proposal ---------------------------- > * All good specs which integrate with HTML5 should, ideally, be a part > of HTML5. Inclusiveness promotes greater attention to each part, and > ensures that the language evolves in directions which are most > helpful. A spec which is separate from HTML5 may find the easiest way > to resolve difficulties is to route around them, rather than altering > or extending the HTML language itself, which may be the best option > overall. While there is nothing erroneous with the statements made in the rationale above, it doesn't address how the rationale relates to Microdata directly. The philosophy, if employed as "the best way to implement specifications", is largely false and ignores the large body of work that constitutes existing Internet and Web specifications. The "many modular specifications" approach is how the IETF and W3C have operated to date and the Internet and Web still work fairly well. Even if one were to assert the above philosophy as true, it is just one possible philosophy among many that may be used to move us forward. Here is another, equally convincing, strategy (in the spirit of the rationale in the counter-proposal): * All good specs which can be built upon HTML5 should, ideally, be placed in a separate specification and vetted thoroughly by the HTML WG. Modularity provides focused specifications and eases the burden on implementers and authors when creating software or web pages that use the features outlined in specifications. A spec which is built on top of HTML5 SHOULD NOT be allowed to route around problems when the best option would be to change the HTML5 spec proper - the W3C Review process is in place to ensure that this is enforced. The W3C Review process has done so for countless other specifications, by inviting reviewers from the HTML WG, to review the extension specifications. Example: HTML5+RDFa is a separate specification built on top of HTML5 and has received a large amount of feedback and interest from the HTML and WHATWG community even though it resides in a separate specification. This feedback has resulted in planned modifications, corrections, a FPWD and 107 additional HTML5+RDFa tests added to the test suite. The Point: The rationale provided in the counter-proposal is a theoretical problem and is contrary to empirical evidence experienced in both the Microdata and RDFa discussions. If Microdata is split out and there is no further interest in it, then it was never a "good spec". If Microdata is split out and is a "good spec", it will enjoy an adequate amount of implementations, feedback, review and testing before REC, much like <video>, <canvas>, and RDFa have in the past (when they were not a part of any HTML specification). > * A spec that is designed within HTML5 and one designed outside of it > are qualitatively different (see Conway's Law). One designed > originally as part of the larger spec tends has a larger "surface > area" alongside the rest of the spec, rather than limiting its > interaction to a small number of channels. This makes it harder to > separate out (though Manu has already done that work) and makes it > more vulnerable to incompatible changes in the larger spec. Something > which originated within the spec is best kept within the spec or > dropped entirely; it should require strong reasoning to separate it > out. This rationale is also fairly theoretical - it effectively boils down to "we might accidentally create bugs when changing a specification" and "we need a good reason to separate Microdata from the HTML5 specification". Those "good reason"s and "strong reasoning" are provided in the body of this change proposal. While it is true that integrating language in a specification allows for a "larger surface area", the argument is not persuasive because a decent test suite should be able to catch most accidentally introduced bugs. The potential bugs and construction of the test suite also have no bearing on where a particular technology is specified (in the HTML5 spec, or in a separate Microdata spec). Example: The RDFa Test Suite is defined for XHTML1, HTML4 and HTML5, even though each specification is in a separate document. It has been fairly effective at catching RDFa Processor bugs and continues to be expanded to cover newly discovered issues. Any incompatible changes in the larger spec would be immediately visible when utilizing an updated XHTML1, HTML4 or HTML5 parser - if the test suite is doing its job. The Point: The way to ensure that a software system (HTML5+Microdata) is operating correctly is to /test it thoroughly/ against a set of specifications - not to ensure that all of the specification language is in one document. > * Many parts of HTML5 cannot be considered 'mature' and are in fact > actively changing, and yet are still part of the spec. It is expected > that these sections, Microdata included, will receive implementation > attention and experience, and will be amended or dropped as these > experiences warrant. Lack of maturity is not a reason for removal of > any other part of the spec, and there is no distinguishing feature of > Microdata that would warrant it being treated differently. The phrase "mature" was intended to imply a number of attributes. Namely, since the HTML5 spec is approaching Last Call at the W3C, it is concerning when any feature has the following attributes - lack of implementation experience, lack of feedback, corner-case bugs, vehement disagreement, a published and competing W3C spec, lack of authoring experience, and lack of deployment experience. When any of these attributes are associated with a feature, it is certainly a reason to consider postponing the inclusion of that feature into HTML5. A number of these attributes are associated with Microdata, namely - lack of implementation experience, vehement disagreement, lack of authoring experience, a published and competing REC W3C spec, and lack of deployment experience. There are a number of individuals that believe that Microdata is not "mature" enough to proceed to Last Call at this time. This group has argued this point in detail, created Change Proposals, volunteered to split the Microdata specification into a separate document, and demonstrated that it is quite possible and fairly easy to split the Microdata specification into a separate document. The Point: The definition of "mature" provided in the counter-proposal does not use the same definition of "mature" used in the original proposal. This rebuttal clarifies the meaning of "mature" and asserts that we should be postponing features that have a laundry list of issues associated with them. > * Microdata does not appear to be in an extreme level of flux to > warrant concerns of it holding up HTML5's progression in the standards > process. If it turns out to indeed limit the main spec it can be > split out at that time, but at the moment this is nothing more than a > theoretical concern. In the other direction, it does not seem likely > that implementations of Microdata will progress any quicker if it was > a separate spec, and so HTML5 cannot be said to be slowing down > Microdata's progress either. In the event that Microdata does fail in > the marketplace, it can simply be removed from the spec at that time; > there does not seem to be any benefit in spending effort to make this > action any simpler. It takes years, if not decades, to "remove" features from the HTML language... much less, see their use halted in browsers. It is far better to be patient and take 2-4 years to ensure that a technology is stable enough to become a part of the HTML language than it is to prematurely insert it into a specification and publish it as part of HTML. The Point: Asserting that "simply removing" features from the HTML spec fails to grasp this particular footnote that is prevalent in HTML's history. There are still people publishing HTML4, warts and all. Removing a feature of HTML is never a simple matter. > * The purpose of the W3C is to advance the web, not to remain neutral > in technological conflicts. If one technology under the W3C's purview > is better than a competing technology, it is our responsibility to > actively decide in favor of it. To do elsewise would be dereliction > of our core duty to the web. Microdata and RDFa are directly > competing, as they accomplish virtually precisely the same thing; > there is no good reason to use both on a page except for gratuitous > proliferation of metadata embedding syntaxes. Separating Microdata into a separate specification buys the technology some time to get implementation, authoring and deployment feedback from search companies, browser manufacturers and authors. If it does, we run the danger of killing it off before it matures and that would be a shame. The mere existence of Microdata is driving the RDFa community to adapt the best parts of Microdata/Microformats for its own use and that is something that will make the Web a better place, even if Microdata eventually fails in the marketplace (or vice-versa). The Point: Microdata isn't ready to compete against RDFa on many levels - namely, number of implementations, published W3C REC status, implementation feedback, test suites, deployment feedback and authoring feedback. Forcing it to do so, prematurely, will almost certainly kill it before we can see if it is a workable solution. > * The Microdata data model is extremely simple for simple, common > cases, and is complex only in rare, complicated cases. Its tree-based > nature (as a set of nested name/value pairs) matches well with both > the HTML language and XML and JSON data storage/interchange formats. > The processing model is extremely simple and well-defined, and > essentially trivial to implement. The DOM API associated with it > makes retrieving metadata from a page via a script in the page > extremely simple, broadening the possible usages of Microdata beyond > spiders and the like to actually being useful in applications. It is, > in short, a simple and intuitive metadata syntax in a field where > neither adjective can typically be applied, backed up by user studies > that directly informed its design. Removing it from HTML5 would > provide no benefit to authors or implementors, and would likely serve > only to slow down the development and deployment of a useful tool for > authors. While it is true that Microdata's data model is extremely simple, it is also true that it is too simple to accomplish many important engineering, science and mathematics related tasks since it does not have support for anything that involves open-ended measurements or property data-typing of any kind. There is no multi-language support in Microdata's data model, making it impossible for web applications to determine the markup language of text data. For example, there is no way to tag the word "chair" to any language in Microdata. That word means something fairly benign in English, but something entirely different in French. Translation software would be very difficult to implement in Microdata. While its tree-based nature is a positive design criteria, Microdata's processing model is no different than RDFa in this respect as RDFa contains the same tree-based processing mechanism and can be easily mapped to a tree-based data structure if needed. Microdata's ability to serialize to XML and JSON data/storage interchange formats is not a defining characteristic as most any data structure can be mapped to XML and JSON. While the DOM API was a first for Microdata, it will be short lived as a DOM API for RDFa is in the works for RDFa 1.1 and will reach REC years before Microdata reaches REC - it will not be a defining characteristic in seven months time. While it has been asserted a number of times that user studies were performed to influence its design, the raw data for these studies have never been provided for 3rd party analysis. The W3C's Technical Architecture Group, the body that oversees the overall system design for the Web, has asked that Microdata be removed from the HTML5 specification. This removal is partly based on Microdata's design decision to not fully support the follow-your-nose principle. This principle asserts that a User Agent should be able to dereference the meaning of semantic predicates like "name", "desc", and "title". User Agents that implement Microdata will have half-baked support for the follow-your-nose principle, unlike RDFa, which has full support for the follow-your-nose principle. Semantic object validation is not supported in Microdata, which makes it impossible for User Agents to understand whether or not the data that they are working with is valid. This will inevitably make User Agents more complicated, since Microdata pushes data validation far up the application layer stack. RDFa has data validation, as well as vocabulary term equivalency (via RDFS and OWL) support, built in. There is nothing useful that Microdata does that RDFa doesn't already do now, or will do in less than one year - long before Microdata reaches REC status. Additionally, Microdata does not have a firm commitment of implementation support by the majority of any single industry (Google, Yahoo - search), nor does it have the commitment to be included in any high profile content management system (Drupal 7), nor does it have the commitment of any major world government (The United Kingdom), nor a scientific body (The Public Library of Science). That said - it would be a mistake to kill off Microdata now... or in the next 2 years. Giving Microdata the benefit of the doubt and the chance to mature over the course of 1-2 years would ensure that the HTML WG makes the proper decision when it comes to choosing a technology for the Semantic Web. The place for that maturation is not the HTML5 specification proper for the reasons listed in this proposal. We should be calculated in this decision and not allow "what we know" now to blind us to "what could be" in the future. The Point: Split Microdata out so it has a chance to mature - the correct technological solution will become clear in time. -- manu [1]http://dev.w3.org/html5/decision-policy/decision-policy.html#change-proposal -- Manu Sporny (skype: msporny, twitter: manusporny) President/CEO - Digital Bazaar, Inc. blog: Bitmunk 3.2 Launched - The Legal P2P Music Network http://blog.digitalbazaar.com/2009/11/30/bitmunk-3-2-launched/
Received on Wednesday, 9 December 2009 20:16:41 UTC