Microformat validation

Hi,
Such a busy list, it's easy to miss gems. One I did spot is the idea
of a validator for microformats ([1] and elsewhere). This is an
excellent idea, IMHO. Some thoughts  -

A precedent for this being worthwhile is the Feed Validator [2], which
has not only provided a handy check for Atom/RSS data (and developers
of tools to produce this, massively amplifying the benefit), it also
helped act as a mechanism for encouraging consideration of tests when
developing formats. Overall, I'm pretty sure the validator has had a
major impact on the quality of syndication data on the Web.

Regarding validation of microformat docs, the base syntax rules are
those of XHTML, so that layer is well covered. For validation of a
single specific format it's not hard to imagine a schema being made
available for the purpose. Relax NG seems to have emerged as the most
flexible language for this kind of job. A Relax NG schema could
(somehow) be associated with each XMDP profile.

What may be problematic is validation of docs where there are multiple
microformats in play and interactions that cannot be conveniently
expressed in something like Relax NG. For example if a blog post
containing hReview is followed by one containing hCalendar, both
appear on the blog front page which already uses XFN... My guess is
that an awful lot could be covered at the syntax level (in the same
way as XHTML+SVG+MathML can be validated), but suspect that because
the data is kind-of tunnelled, it might leak a little, especially
where individual entities could take different roles under different
profiles.

I believe there may a fairly straightforward alternate/complementary
solution, working above the syntax. Using XSLT transformation to
RDF/XML (i.e. GRDDL), merger with an RDF Schema/OWL ontology which
effectively contains the rules, then model consistency checking. i.e.
a semantic validator . This could potentially be fully automated and
operate on arbitrary docs/microformat combinations. (I've been meaning
to try the same technique with Atom for a while, never enough time...)

If a particular microformat does happen to be associated with an RDF
Schema/OWL ontology then this should be already possible (with a
little glue). For example there are already RDF mappings for vCard
[3], iCalendar [4] and a model for reviews [5], so hCard, hCalendar
and hReview are already candidates. (The extent to which the semantics
of the schemas/ontologies can encode the domain language rules is
another matter - my guess is this will again be "mostly").

What I'm not sure is how best to derive the rules for what is/isn't
allowed where RDF schemas/OWL ontologies aren't available. This
shouldn't be a major issue, there isn't a flood of new formats to deal
with, so creating the schemas as needed isn't unfeasible. (Maybe the
machine-readable data in XMDP docs can help?) The other issue that
stands out is how to determine automatically that a given RDF schema
should be associated with a given microformat doc. Again, this isn't a
showstopper, the interesting microformat docs will contain the URI of
their profile in the <head>, so all any application such as a
validator would need is a table mapping these to schema/ontology URIs,
something that could be prepared manually if need be.

Cheers,
Danny.

[1] http://microformats.org/discuss/mail/microformats-discuss/2005-July/000306.html
[2] http://feedvalidator.org/
[3] http://www.w3.org/TR/vcard-rdf
[4] http://esw.w3.org/topic/RdfCalendar
[5] http://www.purl.org/stuff/rev

-- 

http://dannyayers.com

Received on Saturday, 20 August 2005 12:35:23 UTC