W3C home > Mailing lists > Public > public-xg-webid@w3.org > November 2011

Re: how dirty can the HTML be, and still be RDFa?

From: Dan Brickley <danbri@danbri.org>
Date: Fri, 25 Nov 2011 13:49:06 +0100
Message-ID: <CAFNgM+ZYL8vxALHPgRpoXiDRAsarb012oE+BBf+0ByUZHNWSKw@mail.gmail.com>
To: Peter Williams <home_pw@msn.com>
Cc: "public-xg-webid@w3.org" <public-xg-webid@w3.org>

Re dirty HTML, this is a very real issue. HTML documents are usually
pretty crappy, standards-wise.

I'd suggest looking into HTML5's approach. They have a much more
liberal parsing regime than XML (this was one of the major drivers for
the original WHATWG/XHTML fork).

So http://www.w3.org/TR/html5/parsing.html#parsing and nearby define
ways of turning ugly worldy documents into a parsed structure. There's
a parser at http://code.google.com/p/html5lib/ or

See also http://ejohn.org/blog/html-5-parsing/


Received on Friday, 25 November 2011 12:49:35 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:39:48 UTC