- From: Phil Ringnalda <phil@philringnalda.com>
- Date: Thu, 18 Mar 2004 19:40:05 -0800
- To: "Dan Connolly" <connolly@w3.org>
- Cc: "RDF in XHTML task force" <public-rdf-in-xhtml-tf@w3.org>
Dan Connolly wrote: > Can you clarify "quite-possibly invalid HTML"? > Maybe this check-box applies? > > [ ] We rely on HTML that isn't XML, so neither of these > proposals works for us It applies so thoroughly that I despair for my use case. Since it was easier than a real sample, I checked the Technorati Top 100 blogs for ones supporting Trackback. Of the 13 I found, 8 claim to be XHTML, four claim to be HTML, and one lacks even a doctype. All 13 are so completely invalid and ill-formed that a parser would stop and catch fire within a few hundred bytes. I think if anyone decides they want the RDF out of Trackback data, they are better off writing a special-purpose crawler that will regex it out of the hot comments, parse it, and make it available to the rest of the RDF world (though, along with being sloppy, my people are notorious for having hissy fits when people use their metadata the way it's intended to be used, so whoever does it better wear asbestos). Phil Ringnalda
Received on Thursday, 18 March 2004 22:40:09 UTC