- From: Rand McRanderson <therandshow@gmail.com>
- Date: Wed, 04 Apr 2012 20:39:49 -0400
- To: public-xml-er@w3.org
There's a good market for non-browsers wanting an algorithm to parse HTML into a XML compatible state (PHP has a function for this in its XML DOM extension "loadHTML", although that may come from the underlying libxml infrastructure). I guess the use-case can be distilled to if you have a workflow/tool-chain that utilizes XML/XML-based technologies and you want to be able to pull in arbitrary HTML. Of course, if you are pulling in arbitrary HTML documents, and you're not a browser, then displaying the content as it was in the past is not important, so parsing priorities change a little. For example, instead of worrying about properly using the noscript parser step, you could just blanketly treat no-script as a CData section. On the other hand, you may need to want to preserve PI-tags in case someone embedded useful information there. I guess what I am saying is non-browsers (generally) have a priority of preserving the information from the document while making it XML compatible. Browsers have a priority that the information be presentable in a way compatible with how it looked in the past. Those two priorities may not clash, but they might, and it would be nice to decide earlier rather than later how you want to handle this. All that being said, I think a simple browser that could handle 75% of the web would be easier to implement on top of a forgiving XML parser rather than the HTML5 parser algorithm. On a slightly different note, is there any non-HTML use-case for this? Are there large amounts of non-HTML XML-like documents that are badly formed? Do you want to stretch the reach of this parser to documents that are vaguely XML-like such as Apache configuration files? Switching to a third note, one possibility with xml-er is instead of aiming for a forgiving xml parser, aim for a parser framework that could encompass both the HTML-5 parser + XML parser in a way that differs as a matter of configuration but not concept. I guess, xml-er could work on ways of specifying configuration to a parser of how to handle errors or how to not handle them.
Received on Thursday, 5 April 2012 00:40:47 UTC