W3C home > Mailing lists > Public > semantic-web@w3.org > March 2008

request for comments: Cognition metadata parser

From: Toby A Inkster <usenet200801@tobyinkster.co.uk>
Date: Sat, 1 Mar 2008 13:30:24 +0000
To: semantic-web@w3.org
Message-ID: <gc2o95-gu5.ln1@ophelia.g5n.co.uk>


I'm (planning on) writing a browser (reusing a rendering engine though!)
with a focus on metadata. I've not started the GUI yet, but have so far
written a parser to extract semantics from (X)HTML pages, and I'd
appreciate feedback on it. 

It currently supports:

	* RDFa
	* eRDF
	* <title> element
	* <meta> element
	* <link rel> / <a rel> / <link rev> / <a rev>
	* The "role" attribute
	* Several microformats
	* Document structure (headings, etc)

This data is parsed into an RDF-like data structure and can be dumped out
in RDF, or as a Perl object dump.

I'd appreciate examples where it fails. I'm aware that it occasionally
fails due to encoding and/or entity problems -- I'd prefer examples where
it simply fails to find some piece of metadata.

I'd also like to know of any places where you think the RDF output could
be improved.

Thanks in advance for your feedback,

Toby A Inkster BSc (Hons) ARCS
Received on Saturday, 1 March 2008 15:50:19 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:05 UTC