request for comments: Cognition metadata parser

Hello,

I'm (planning on) writing a browser (reusing a rendering engine though!)
with a focus on metadata. I've not started the GUI yet, but have so far
written a parser to extract semantics from (X)HTML pages, and I'd
appreciate feedback on it. 

It currently supports:

 * RDFa
 * eRDF
 * <title> element
 * <meta> element
 * <link rel> / <a rel> / <link rev> / <a rev>
 * The "role" attribute
 * Several microformats
 * Document structure (headings, etc)

This data is parsed into an RDF-like data structure and can be dumped out
in RDF, or as a Perl object dump.

I'd appreciate examples where it fails. I'm aware that it occasionally
fails due to encoding and/or entity problems -- I'd prefer examples where
it simply fails to find some piece of metadata.

I'd also like to know of any places where you think the RDF output could
be improved.

Thanks in advance for your feedback,

-- 
Toby A Inkster BSc (Hons) ARCS

Received on Saturday, 1 March 2008 15:50:19 UTC