[whatwg] Forbidden characters in text/html from Ian Hickson on 2006-03-11 (public-whatwg-archive@w3.org from March 2006)

From: Ian Hickson <ian@hixie.ch>
Date: Sat, 11 Mar 2006 01:21:41 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0603110120500.315@dhalsim.dreamhost.com>

On Sat, 25 Feb 2006, Henri Sivonen wrote:
>
> On Feb 25, 2006, at 02:02, Ian Hickson wrote:
> 
> > On Sat, 23 Jul 2005, Henri Sivonen wrote:
> > > 
> > > Which characters should a text/html HTML5 conformance checker consider
> > > forbidden? The same characters that are forbidden in XML 1.0 (\0, FF,
> > > etc.)? Or some other set?
> > 
> > In what context?
> 
> In the pre-parse Unicode character stream on one hand and in the 
> post-parse (that is NCRs expanded) character data and attribute values 
> on the other. IIRC, in XML 1.0 (but not 1.1) the restrictions are the 
> same in both cases.

Well, the spec says to drop U+0000, and do something with U+000D such that 
U+000D never appears in the parse stream; the post-parse is just the DOM.

Does that answer your question?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 10 March 2006 17:21:41 UTC