Re: how dirty can the HTML be, and still be RDFa?

[snip]

Re dirty HTML, this is a very real issue. HTML documents are usually
pretty crappy, standards-wise.

I'd suggest looking into HTML5's approach. They have a much more
liberal parsing regime than XML (this was one of the major drivers for
the original WHATWG/XHTML fork).

So http://www.w3.org/TR/html5/parsing.html#parsing and nearby define
ways of turning ugly worldy documents into a parsed structure. There's
a parser at http://code.google.com/p/html5lib/ or
http://about.validator.nu/htmlparser/

See also http://ejohn.org/blog/html-5-parsing/

cheers,

Dan

Received on Friday, 25 November 2011 12:49:35 UTC