Microformats survey (for GRDDL)

I've completed a survey of the current state of microformats with
regard to the GRDDL mechanism for data extraction. In principle GRDDL
is usable on all documents which use microformats.

Results tabulated at:


(It's on the ESW Wiki - please correct any errors/omissions directly)

Short version: following a strict interpretation of the relevant
specs, no official microformats are currently usable with GRDDL.
Taking a loose view, around a third are right now.

As anticipated, the weakest link is the non-existence of profile URIs.
Of the 18 microformats listed, only 3 have profile URIs directly
usable by GRDDL-aware agents (hCard, hCalendar & hReview), and none of
these URIs are endorsed by microformats.org.

Only 1 of the 18 has an endorsed profile URI (XFN), and that isn't
GRDDL-enabled. It was suggested on microformats-discuss that relevant
Wiki pages for the microformats could be used as interim profile URIs,
but again these aren't GRDDL-enabled. Most of the microformats do have
an XMDP expression of their profile, yet with a couple of exceptions
these are listed as source markup, i.e. not really human or
machine-readable. It isn't obvious what the intended purpose of this
might be.

XSLT to RDF/XML is available in various stages of completion for 6 of
the 18 microformats listed (including the 3 with unofficial profile

In other words, only 4 of these 18 formats exploit the HTML
specification fully for disambiguation. Because the profile URIs
corresponding to 3 of these have appeared outside the microformats.org
process, only one format+profileURI combination may properly be called
a microformat (rather than 'semantic HTML'), and that one isn't

While this limits the publisher's options when it comes to publishing
data in HTML, consumers may still use heuristics based on GRDDL or
similar mechanisms to extract data from microformat-enhanced documents
(i.e. screenscraping), with the obvious impact on reliability &
authority of the data, questions of provenance etc.




Received on Tuesday, 20 March 2007 11:55:08 UTC