- From: Ben Adida <ben@mit.edu>
- Date: Fri, 18 Aug 2006 05:54:31 -0400
- To: Danny Ayers <danny.ayers@gmail.com>
- CC: public-grddl-wg <public-grddl-wg@w3.org>
Danny Ayers wrote: > [...] > 2. Dealing with microformat data which lacks a profile URI is a > different story. Right now it involves essentially scraping for > attribute strings like "vevent". Seems to me there are two general > strategies: raise the status of these strings to registered > identifiers (presumably e.g. class="vevent" is relatively unlikely to > appear in non-microformat docs); find a way to express the information > that any data extracted from the document was done so without the > license the profile URIs provides. It seems to me that it's actually much more complicated than that. You'd have to scrape for every known microformat, and the list is obviously growing as time goes on. That's already problematic, because it implies some central repository of "known vocabularies" that are assumed for all web pages... that's quite a bit more centralized than most folks have come to expect from the web. What's more problematic, though, is that the interplay between multiple microformats is not well defined. This is somewhat expected given that each microformat is optimized for its specific field, but it also means that it will become harder to parse one microformat without at least *knowing* about the others, even if you don't want to parse the others. In other words, I think the "parsing microformats without profile URIs" is a pretty deep rathole as far as standardization is concerned. On this, I agree with DanC (gasp, this is not a regular occurrence!): if a Google-like entity wants to parse and make a best guess as to what metadata is included on a page, then more power to them. But to make "guessing a transform" standard behavior seems awfully difficult and error-prone. -Ben
Received on Friday, 18 August 2006 09:54:35 UTC