Hello, I'm (planning on) writing a browser (reusing a rendering engine though!) with a focus on metadata. I've not started the GUI yet, but have so far written a parser to extract semantics from (X)HTML pages, and I'd appreciate feedback on it. It currently supports: * RDFa * eRDF * <title> element * <meta> element * <link rel> / <a rel> / <link rev> / <a rev> * The "role" attribute * Several microformats * Document structure (headings, etc) This data is parsed into an RDF-like data structure and can be dumped out in RDF, or as a Perl object dump. I'd appreciate examples where it fails. I'm aware that it occasionally fails due to encoding and/or entity problems -- I'd prefer examples where it simply fails to find some piece of metadata. I'd also like to know of any places where you think the RDF output could be improved. Thanks in advance for your feedback, -- Toby A Inkster BSc (Hons) ARCSReceived on Saturday, 1 March 2008 15:50:19 UTC
This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:05 UTC