hand authoring web pages (was Re: Exploring new vocabularies for HTML)

I suspect that this topic will generate more flames than data, but...

As researchers and designers, it is important for us to realize our own
biases and the uniqueness of the world that immediately surrounds us.  Eg,
if we ask our colleagues, we might deduce that 20% - 30% of computer users
use emacs.  Of course, that is complete nonsense, but that is the kind of
false statistic/impression we would get from our own immediate environment.

I *suspect* that a similar phenomena leads some people to conclude that hand
authoring of web pages is common practice, and hence should be a priority in
the design of HTML5.  It would be great to have some statistics on hand

There are at least two possible ways to measure the importance of hand
1.  The number of authors who hand-author web pages.  Since most people both
hand author and use tools, this number probably needs to be broken down in
some way.

2.  The total number of web pages that are authored by hand.

On the surface, it would *appear *to me that hand authoring accounts for a
tiny fraction of the total number of pages.  Maybe less than 0.001%.  Here's
why I think that:

Virtually all of the pages on commercial sites (amazon, ebay, craigslist,
facebook, youtube, ...) are generated by software.  That's almost certainly
the majority of web pages right there.

Another large group of web pages consist of wikis and blogs.  Again, the web
pages are generated by software.  Some (probably small) group of people
occasionally might edit the raw HTML to fix a problem, but editing to fix a
problem isn't really helped much by terseness, and complex tag minimization
rules might actually make it harder to edit if the software generating the
page tried to take advantage of them.

Another group of pages is generated by what we typically think of as web
page editing tools (Dreamweaver, FrontPage, GoLive, XMLSpay, ...).  Adding
up the sales of those can give some indication of the number of users of
those products.  Or for the open source version, looking into the number of
downloadsI don't have numbers for these, but again, this is probably
a substantial number.

Similarly, in some organizations (schools or business or ...),  content
management systems are used for web authoring.  Again, software would mainly
be used for web page development although some hand editing might be done as
you noted.

Yet another chunk of web pages come from programs such as Word or
OpenOffice, etc., which offer a "Save as HTML" option.

Against all of these sources, it *seems* like hand authoring would account
for a tiny, tiny fraction of web pages.  I suspect that even if you limit
the population to .edu sites, the numbers are small.  Some (many?) of the
software that generates web pages leave marks that identify its source.
Being at google, you or a colleague could do a little work and determine an
upper bound on the number of web sites that are hand authored.  That could
be used to justify the high priority you have placed on hand authorablility.

However, without real facts, it is hard to see how hand authoring can be
considered a priority.  Our personal experiences are just not informative of
what the majority of users do.

Neil Soiffer
Senior Scientist
Design Science, Inc.
~ Makers of Equation Editor, MathType, MathPlayer and MathFlow ~
