Re: Microformat profile URIs from Ben Adida on 2006-08-18 (public-grddl-wg@w3.org from August 2006)

From: Ben Adida <ben@mit.edu>
Date: Fri, 18 Aug 2006 05:54:31 -0400
To: Danny Ayers <danny.ayers@gmail.com>
CC: public-grddl-wg <public-grddl-wg@w3.org>
Message-ID: <44E58E57.8010402@mit.edu>

Danny Ayers wrote:
> 

[...]

> 2. Dealing with microformat data which lacks a profile URI is a
> different story. Right now it involves essentially scraping for
> attribute strings like "vevent". Seems to me there are two general
> strategies: raise the status of these strings to registered
> identifiers (presumably e.g. class="vevent" is relatively unlikely to
> appear in non-microformat docs); find a way to express the information
> that any data extracted from the document was done so without the
> license the profile URIs provides.

It seems to me that it's actually much more complicated than that. You'd
have to scrape for every known microformat, and the list is obviously
growing as time goes on. That's already problematic, because it implies
some central repository of "known vocabularies" that are assumed for all
web pages... that's quite a bit more centralized than most folks have
come to expect from the web.

What's more problematic, though, is that the interplay between multiple
microformats is not well defined. This is somewhat expected given that
each microformat is optimized for its  specific field, but it also means
that it will become harder to parse one microformat without at least
*knowing* about the others, even if you don't want to parse the others.

In other words, I think the "parsing microformats without profile URIs"
is a pretty deep rathole as far as standardization is concerned. On
this, I agree with DanC (gasp, this is not a regular occurrence!): if a
Google-like entity wants to parse and make a best guess as to what
metadata is included on a page, then more power to them. But to make
"guessing a transform" standard behavior seems awfully difficult and
error-prone.

-Ben

Received on Friday, 18 August 2006 09:54:35 UTC