On microformat 'schemas'

>From the "I saw this and thought of you" department...

http://smackman.com/2006/06/01/an-old-idea/

Has an interesting account of some current difficulties with
microformats, as well as a proposal for a schema system. I haven't 
digested the latter, but certainly agree with the problem analysis:

Excerpting, 
[[
Ive been giving some thought to parsing microformats lately. A few
threads seem to be converging

The first is that its hard to parse microformats. You can hand-write a
parser in a little bit of time thats 80% right. But getting all of the
hcard rules, e.g., encoded is tricky. Its reasonable to assume,
therefore, that there are a lot of 80% parsers out there like the one I
wrote for my Ray Ozzie Clipboard example.

The second issue relates to hatom, which uses different class names for
the same concept at different scopes. For example, the entry title is
called entry-title not title. I asked Ryan about this when I saw him at
www2006, and he told me that they vacillated on this decision, but they
settled on entry-title because people can nest other microformats inside
hatom, and so it would be easier for the parser writers if there were no
colliding class names, even in different microformats. In fact, he
suggested that theyd probably made a mistake with hcard, since the class
names were so likely to collide with other microformats. Ok, so in other
words entry-title is a hack around the problem of it being hard to parse
microformats, and we can expect more of these.

When I bumped into Brian at the same event, I commented that
microformats really have a problem with nesting. He agreed. He said it
put a burden on the parser writer to potentially have to understand all
microformats in order to reliably parse web pages that contain them.

So,

   1. Its a lot of trouble to write a parser
   2. Bad parsers will proliferate
   3. Microformats are evolving toward being easier to parse, not easier
to create
   4. Its not clear how you can nest microformats w/o knowing how
parsers will behave
   5. Users are discouraged from inventing their own specialized
microformats, presumably because of the risk of collisions and
difficulty others will have in parsing them

[...]
]]


Dan

Received on Sunday, 4 June 2006 11:07:51 UTC