request for comments: Cognition metadata parser from Toby A Inkster on 2008-03-01 (semantic-web@w3.org from March 2008)

From: Toby A Inkster <usenet200801@tobyinkster.co.uk>
Date: Sat, 1 Mar 2008 13:30:24 +0000
To: semantic-web@w3.org
Message-ID: <gc2o95-gu5.ln1@ophelia.g5n.co.uk>

Hello,

I'm (planning on) writing a browser (reusing a rendering engine though!)
with a focus on metadata. I've not started the GUI yet, but have so far
written a parser to extract semantics from (X)HTML pages, and I'd
appreciate feedback on it. 

It currently supports:

 * RDFa
 * eRDF
 * <title> element
 * <meta> element
 * <link rel> / <a rel> / <link rev> / <a rev>
 * The "role" attribute
 * Several microformats
 * Document structure (headings, etc)

This data is parsed into an RDF-like data structure and can be dumped out
in RDF, or as a Perl object dump.

I'd appreciate examples where it fails. I'm aware that it occasionally
fails due to encoding and/or entity problems -- I'd prefer examples where
it simply fails to find some piece of metadata.

I'd also like to know of any places where you think the RDF output could
be improved.

Thanks in advance for your feedback,

-- 
Toby A Inkster BSc (Hons) ARCS

Received on Saturday, 1 March 2008 15:50:19 UTC