"Just create a Microformat for it" - thoughts on micro-data topic from Manu Sporny on 2009-05-06 (public-rdf-in-xhtml-tf@w3.org from May 2009)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Tue, 05 May 2009 23:09:40 -0400
To: WHATWG <whatwg@lists.whatwg.org>
Message-ID: <4A00FF74.4010702@digitalbazaar.com>
bcc: Public RDFa Task Force mailing list (but not speaking as a member)

Kyle Weems recent post[1] on CSSquirrel discusses[2] some of the more
recent rumblings surrounding RDFa and Microformats as potential
micro-data solutions. It specifically addresses a conversation between
Ian and Tantek regarding Microformats:

http://krijnhoetmer.nl/irc-logs/whatwg/20090430#l-693

Since I've seen this argument made numerous times now, and because it
seems like a valid solution to someone that isn't familiar with the
Microformats process, I'm addressing it here. The argument goes
something like this:

"It looks like that markup problem X can be solved with a simple
Microformat."

This seems like a reasonable answer at first - Microformats, at their
core, are simple tag-based mechanisms for data markup. Most semantic
representation problems can be solved by explicitly tagging content.

What most people fail to see, however, is that this statement
trivializes the actual implementation cost of the solution. A
Microformat is much more than a simple tag-based mechanism and it is far
more difficult to create one than most people realize. Creating a
Microformat is a very time consuming prospect, including:

  1. Attempting to apply current Microformats to solve your problem.
  2. Gathering examples to show how the content is represented in the
     wild.
  3. Gathering common data formats that encode the sort of content
     you are attempting to express.
  4. Analyzing the data formats and the content.
  5. Deriving common vocabulary terms.
  6. Proposing a draft Microformat and arguing the relevance of each
     term in the vocabulary.
  7. Sorting out parsing rules for the Microformat.
  8. Repeating steps 1-7 until the community is happy.
  9. Testing the Microformat in the wild, getting feedback, writing
     code to support your specific Microformat.
  10. Draft stage - if you didn't give up by this point.

I say this as the primary editor of the hAudio Microformat - it is a
grueling process, certainly not for those without thick skin and a
strong determination to complete even simple vocabularies. Each one of
those steps can take weeks or months to complete.

I'm certainly not knocking the output of the Microformats community -
the documents that come out of the community have usually been vetted
quite thoroughly. However, to hear somebody propose Microformats as a
quick or easy solution makes me cringe every time I hear it.

The hAudio Microformat initiative started over 2 years ago and it's
still going, still not done. So, while it is true that someone may want
to put themselves through the headache of creating a Microformat to
solve a particular markup problem, it is unlikely. One must only look at
our track record - output for the Microformats community is at roughly
10 new vocabularies[3] (not counting rel-vocabularies and vocabularies
not based directly on a previous data format).

Compare that with the roughly 120-150 registered[3], active RDF
vocabularies[4] via prefix.cc. Now certainly, quantity != quality,
however, it does demonstrate that there is something that is causing
more people to generate RDF vocabularies than Microformats vocabularies.

Note that this argument doesn't apply to class-attribute-based semantic
markup, but one should not make the mistake that it is easy to create a
Microformat.

-- manu

[1] http://www.cssquirrel.com/comic/?comic=16
[2] http://www.cssquirrel.com/2009/05/04/comic-update-html5-manners/
[3] http://microformats.org/wiki/Main_Page#Specifications
[4] http://prefix.cc/popular/all

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: A Collaborative Distribution Model for Music
http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/
Received on Wednesday, 6 May 2009 03:10:26 UTC