- From: Ian Hickson <ian@hixie.ch>
- Date: Fri, 15 Jun 2007 01:09:28 +0000 (UTC)
On Mon, 6 Nov 2006, Henri Sivonen wrote: > On Nov 6, 2006, at 07:34, Ian Hickson wrote: > > On Sun, 5 Nov 2006, Henri Sivonen wrote: > > > > > > Is there a reason why the definition of space characters does not > > > match the XML 1.0 and RELAX NG definition of white space (space, > > > tab, CR, LF) but also includes (line tabulation and form feed)? Is > > > the deviation from XML 1.0 needed for backwards compatibility with > > > text/html UAs? > > > > I made the parser consider VT and FF as being whitespace based on, as > > I recall, a complete examination of every Unicode character's > > behaviour in the parsers I was testing. The definition of "space > > characters" matches the parser's behaviour for consistency. > > > > The definition of "space characters" doesn't affect the XML parser > > stage as far as I can recall, only attribute parsing and DOM > > conformance. > > The potential problem with it affecting DOM conformance is that it may > have ripple effects to running XML tooling inside a browser engine. > Gecko has an XPath implementation. Disruptive Innovations has created a > RELAX NG implementation for Gecko. Running the schemas from > syntax.whattf.org on a DOM inside Gecko would be interesting, since it > would allow checking DOM snapshots modified by scripts. There may be > other reasons to run XML machinery on an HTML DOM in a browser. Both > XPath and RELAX NG assume that white space-separated tokens follow the > XML notion of white space. Not being able to use the native XPath and > RELAX NG notions of splitting on white space would be seriously uncool. > Of course, a browser engine might get away with tampering with the XPath > or RELAX NG notions of white space since the additional characters don't > occur in XML. But does it make sense to inflict the cost of such > tweaking on the XML parts of browser engines? > > Would there be serious compatibility problems if the HTML5 parsing > algorithm required VT and FF to be mapped to space (after expanding > NCRs) and the higher-level parts of the spec defined white space as > space, tab, CR and LF? Well, I don't much care about VT, but I really think we should round-trip form feed. Consider, for instance, RFCs, which have form feeds. I don't like the idea of dropping them on the floor when you convert RFCs to HTML and back to text again. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 14 June 2007 18:09:28 UTC