- From: Ben Boyle <benjamins.boyle@gmail.com>
- Date: Wed, 2 Apr 2008 21:49:46 +1000
- To: "HTML WG" <public-html@w3.org>
Oh no, not another tangent/thread on this topic! My apologies... but I couldn't work out which one to reply to. Plus I have a question that is quite separate first. Something I don't quite understand about "HTML5" yet. I understand there are two possible serialisations: html and xhtml (xml). I understand xhtml parsing is xml parsing, with draconian error handling (which is not really a new thing). I understand html parsing is a new thing (replacing sgml parsing), documenting what browsers must do to produce a valid DOM, including handling of non-conforming markup. This html parsing will also support (i.e. we will document) syntax that is not well-formed xml (what we know and love as "classic html") and it shall be considered to be conforming. I am not quite sure where the line between the two is... there is (for me) a blurry grey area around well-formed xhtml source code, which could be parsed - successfully - as either xhtml OR html (assumably producing the same dom?). How will a UA decide whether to use html parsing? Is it triggered by doctype, mime type, xmlns or something else? This question may seem moot (if the same dom is produced, who cares?) ... until one introduces an error into the markup. The reason I ask (well aside from just wanting to understand it better) is that I was discussing the math/svg in html serialisations thing, and the fact that html5 does define and support xhtml/xml parsing (there is still confusion over this - people think "html5" represents W3C abandoning xhtml activity) ... and I was describing one of the options being looser html-style parsing of svg/mathml markup - parsing that embraces error recovery rather than draconian error handling. My mate's question was: "so there'd be like a switch, so I could opt into super-god-mode parsing?" I thought it was interesting. Something I would be interested in. Being able to choose between html or xml "well-formedness". Being able to choose between draconian error handling and html error recovery. Because I would really like to author - to the best of my ability - well-formed and valid xhtml+math+svg BUT I would prefer to have browsers present those documents using error recovery rather than draconian error handling ... so if I make (or import) any mistakes, well, something is still presented. And maybe I don't want to get hung up on whether a bit of "classic html" syntax works its way into the mix. I'd rather focus on making the content clear and easy to read, and the navigation sensible, than worry too much about markup syntax. I don't know if this is useful to the current discussion, but there you have it. And it does beg the question: could the work undertaken to define "html to dom" parsing be applicable to parsing all xml (e.g. on the server side, to send a html document through XSLT for example... the html parser could produce the required dom without requiring the source document be reworked into xml well-formedness first) ... but that's probably a much bigger question better asked later. cheers Ben
Received on Wednesday, 2 April 2008 11:50:25 UTC