- From: Aryeh Gregor <ayg@aryeh.name>
- Date: Mon, 22 Aug 2011 16:33:26 -0400
- To: liam@w3.org
- Cc: Karl Dubost <karl+w3c@la-grange.net>, Richard Ishida <ishida@w3.org>, spec-prod@w3.org, Doug Schepers <schepers@w3.org>, Philippe Le Hegaret <plh@w3.org>
On Sun, Aug 21, 2011 at 11:15 PM, Liam R E Quin <liam@w3.org> wrote: > Seems to me a requirement should be that the format issuitable for > archiving. That's very reasonable. > This means that the document indicates to exactly which > version of which specification(s) it conforms, and actually does > conform. > > Without a formal public identifier in the doctype declaration, or a > version attribute on the HTML element, I don't personally consider it > acceptable to use HTML 5 in a situation in which long term archiving is > expected. > > Even with such version indication, HTML 5 must obviously not be used in > archived contexts until it is a stable specification - in W3C terms that > means a Recommendation. That doesn't follow. Archivability of the format is how reliably we expect it to be readable into the distant future. Just because something ostensibly conforms to a Recommendation doesn't mean it's readable at all. It would be possible to construct a document that's valid HTML 4.01 and completely readable in every major browser, but which is not decipherable using something based solely on the HTML 4.01 standard, because it relies on browser behavior that the standard either doesn't specify or specifies differently from how browsers behave. In fact, it would be easy to do this, because HTML 4.01 is extremely vague, makes extremely few testable assertions, and has no test suite at all as far as I know. A document that conforms to HTML 4.01 is no more archivable than one that conforms to only de facto standards, because HTML 4.01 doesn't define anything in more precision than a de facto standard anyway. Moreover, just because something is not yet a Recommendation doesn't mean it's not stable. The relevant parts of HTML and CSS are fixed in stone because of browser compatibility constraints. If the browsers of ten or twenty years from now can't reliably display pages pretty much the same way that browsers display them now, a huge part of the web will no longer work. Browsers today can all display typical ten- or twenty-year-old pages with no big problems, because if they start displaying existing pages differently, they immediately get angry complaints from users. All evidence suggests this will be a reliable invariant going forward. In fact, the HTMLWG explicitly considered the question of whether to add version identifiers to HTML5: <http://lists.w3.org/Archives/Public/public-html/2010Dec/0135.html>. It concluded that a version indicator is not necessary, because (roughly) all future versions of HTML are expected to be backward-compatible, and in the unlikely event that they're not, a version indicator can be added at that point. Do you think that the W3C shouldn't use HTML5 for its publications ever if its staff disagrees with its Working Group on the necessity of version indicators? Isn't the Working Group responsible for technical decisions relevant to the standards it works on, not the W3C administration? Still further, HTML is one of the world's most widely-used data formats. In the extremely unlikely event that it ever goes away or changes such that old documents aren't readable anymore, we can be sure there will be a very long transition period where legacy HTML processors are available, and will be usable to convert old documents to new formats. This would not be an ideal situation, but it's exceedingly unlikely and still quite manageable in the hypothetical situation where it does occur. There's also the fact that there are no restrictions on content that's included from other files. Specifications can use CSS/JS features or image formats or whatnot that are unstably specified or not specified at all, but not new HTML features. W3C specifications commonly rely on CSS to distinguish normative from non-normative text, but there are no restrictions on what standards that CSS must conform to. Even *if* archivability were really an issue here, and even *if* requiring standardization were really a solution, the status quo doesn't require standardization at all for key parts of the document. Also, HTML is by its nature a text-based format, and the normative portions of the specifications we're discussing are all text. Even if you knew absolutely nothing about HTML as a format, reading the source code would more than suffice to correctly decipher the standard, given a little work. I could raise more objections here -- like pointing out the unlikelihood of any W3C specifications being useful in the event that no one knows how to read HTML anymore -- but I think I've made my point. Standardization is neither necessary nor sufficient for archivability, and the document formats under discussion are so widely used that they'd be suitable for indefinite archival regardless of whether they were meaningfully standardized (which indeed they largely were not prior to HTML5). HTML5 is every bit as archivable as HTML 4.01 in practice, regardless of nominal maturity. On the other hand, HTML5 will not reach Recommendation for probably another decade, and in the meantime other specifications are stuck using an obsolete set of features. The request I made was completely practical: there are useful features in HTML5 and W3C specs should be able to take advantage of them. Do you have any objections that are comparably practical? Do you foresee any concrete, short- to medium-term harm from permitting the use of HTML5 for W3C specifications? Or are the issues you have with publication as HTML5 solely a matter of principle?
Received on Monday, 22 August 2011 20:34:30 UTC