Response to HTML5 license survey

(Note: this is my personal opinion.  It does not represent Google's position.)

First, the full HTML5 specification is available from the WHATWG under
a permissive license:

http://svn.whatwg.org/webapps/

So the issue is what message the W3C should be sending to the larger
world, and what precedent we should set for other W3C specs, not
whether people can actually fork (they can anyway).

I'm strongly opposed to any license that does not explicitly permit
forking, including all three of the W3C license options.  I would
support any widely-used permissive license (such as CC0, MIT, or
three-clause BSD).

The primary argument against a license that allows forking is that the
existence of multiple competing standards would be damaging, since it
would fragment the market, confuse people, etc.  No one disputes the
fact that forks are very bad, compared to working out problems without
forking.  It is also clear that if HTML5 were restrictively licensed,
rewriting it from scratch would be extremely hard.  HTML5 is very
large and detailed, and rewriting it would require years of full-time
work by highly skilled editors.


Now, the only reason anyone would want to base a new standard on HTML5
is because they want to write a web-compatible standard.  Most of the
standard, for example the entire parsing section, makes no sense for
any purpose but processing legacy content.  It is chaotic and
complicated, and most of it is fine details that could be rewritten
much more simply if not for legacy constraints.  Anyone writing a
standard that doesn't need compatibility with existing content could
use only a small fraction of the HTML5 text, which it could easily
rewrite, so licensing restrictions would not affect it.

Furthermore, the only reason a prospective forker wouldn't be able to
easily write their spec as a list of changes to HTML5 is because they
want to make extensive and detailed changes.  If they're only adding
new features, or removing existing features, or making minor
adjustments here and there, it would be easiest to just state the
differences.  This would make their standard much shorter and easier
to maintain -- and also presumably not subject to licensing
restrictions.  A list of changes would only be harder to use and
maintain if the changes affected many particular details, so that the
reader would have to jump back and forth to understand the meaning and
the maintainers would have to update regularly to merge with changes
in HTML5.

But the only ones with the motive to write a web-compatible spec that
differs from HTML5 in extensive and detailed ways are implementers of
browser engines.  Compatibility with existing content is extremely
important to them, because users can easily switch to another browser
if pages don't work right.  They need to be able to refine the spec
continually to make its algorithms match web content better as bug
reports come in.  Nothing other than a browser engine needs this level
of compatibility, because nothing other than a browser engine is
expected to process HTML exactly like browser engines do.

The only time implementers of browser engines would want to fork HTML5
would be if either the implementers of all major engines felt that
continuing HTML5 development at the W3C was seriously problematic, or
if the implementers of at least two major engines felt that it was so
disastrously harmful that they were willing to abandon compatibility
with the remaining engines.  Minor browser engines don't matter here,
because they're forced to follow whatever standard the large browser
engines do, or risk losing market share.  A single browser engine
wouldn't fork either, because a standard that expects to have only one
implementation is useless.

If the development of HTML at the W3C ever degenerates to the point
that multiple major implementers think it's better to fork than to
continue at the W3C, the W3C has ipso facto failed at its job as a
standards body, and the fork will be a *good* thing.  Given how
disruptive a fork is, such a situation can only happen when the W3C
persistently and flagrantly ignores the real-world needs of
implementers, which in turn are forced by market pressure to reflect
the needs of users.  If the W3C were able (such as by licensing
restrictions) to prevent a fork even when it has failed that terribly,
it would only be destructive to the web.

Thus while most forks are bad, the very special class of forks that
would actually be hindered by licensing restrictions are most likely
*good*, if not indispensable.  Those serve as a safety hatch in case
the W3C fails badly at its job.


Unfortuately, such forks are not only theoretical, because the W3C did
fail badly at its job in recent memory.  Before HTML5 was started in
2004, the last HTML standard that defined features authors could use
in practice was released in 1998.  The work after that date was all on
XHTML, which was never used significantly by authors.  When Mozilla
and Opera asked the W3C if they could work on adding non-XML features
to HTML that would be usable by authors in the short term, the W3C
refused, so they created the WHATWG (along with Apple) to work on a
standard that would be useful to them.  Eventually the W3C
acknowledged that the WHATWG work was the right path to pursue,
forming this HTMLWG in 2007 to work on the WHATWG's HTML standard and
disbanding the competing XHTMLWG in 2009.

The W3C's inattention to author needs from 1998 to 2007 was extremely
harmful, and directly contributed to the flourishing of proprietary
technologies.  Increases in computing power and bandwidth made major
new web applications practical, such as video and in-browser 2D
graphics, but not even basic support was added to any standard that
browsers felt they could implement.  Thus Macromedia Flash (now Adobe
Flash) gained nearly 100% market share, and it has become so integral
to web content in practice that typical users would rightfully regard
a browser that didn't support Flash as broken.  It's difficult to
imagine a greater failure of the standards process.  Only because the
WHATWG fork defined standard video and canvas tags is it becoming
practical to even begin loosening Flash's stranglehold on the web, and
that will take years yet.

Nor is it merely conjecture that other types of forks do not need to
reuse the specification text, as evaluation of actual historical forks
shows.  A partial list of HTML forks not sanctioned by the W3C is
given at <http://wiki.whatwg.org/wiki/Forking#Existing_forks_of_HTML>:
ISO HTML, WML, XHTML-MP, WTVML, WHATWG HTML, CE-HTML, EPUB, and HTML
4.1.  Of these, I couldn't figure out how XHTML-MP, WTVML, or CE-HTML
work, because the standards don't seem to be available for free
online.  ISO HTML and EPUB are both defined purely by reference to
preexisting HTML standards, only listing particular changes, so they
could not be prevented by licensing.  WML is only loosely based on
HTML, and wasn't intended for compatibility with existing web content
or browsers, so it never had any need to reuse specification text.
HTML 4.1 is a rewrite from scratch, but is organized by an ad hoc
group on a wiki, seems to be entirely inactive, and doesn't show any
indications of interest by any implementers at all, so it can safely
be ignored.  WHATWG HTML is the only one that actually would have
benefited from a large body of existing high-quality spec text to
build on, had such a body existed when it was forked in 2004 -- and
that fork was unequivocally good.

By contrast, those who argue that forking is bad have failed to
provide any concrete examples of bad forks that would have benefited
from a permissive specification license.  Because the WHATWG has made
HTML5 available under a permissive license since 2004, such examples
should be easy to come by if they were likely to come up.  But to my
knowledge, none exist.  For instance, EPUB3 is based on HTML5, so it
could have forked the text, but it actually just defines a list of
changes.  Thus we have clear real-world evidence set against
unsubstantiated conjecture.


Another objection to allowing forking is that companies will not want
to pay specification editors if they have no control over the results.
 This might be true in some cases, but HTML5 editing is paid for by
Google, which already releases it under a permissive license at the
WHATWG.  More generally, this is an argument against requiring all W3C
specs to be permissively licensed, but it is not an argument against
licensing specific specifications permissively if the editors'
employers want it.  Also, companies do not currently retain rights to
the specifications they sponsor, as a matter of course: they're
required to assign copyright to the W3C.  Permissive licensing gives
them *more* rights to the work, since they can continue it outside the
W3C if they see fit.

Wayne Carr also raises the possibility that organizations might create
device-specific variants of specs instead of working within the W3C.
This has actually happened, as in the case of WML.  However, as I
explain above, such specifications will not have any need for the
HTML5 text itself, and will not be affected by its license.  WML, for
example, is only loosely based on HTML.  In other cases, vendors add
features to support their devices, but again, there is no need to
reproduce any HTML5 spec text to add new features.  For instance, some
of Apple's proprietary iOS extensions are documented here, and no text
from HTML5 is present:

http://developer.apple.com/library/safari/#documentation/appleapplications/reference/SafariWebContent/ConfiguringWebApplications/ConfiguringWebApplications.html

Received on Wednesday, 4 May 2011 23:07:17 UTC