- From: Richard A. O'Keefe <ok@cs.otago.ac.nz>
- Date: Mon, 14 Oct 2002 11:23:04 +1300 (NZDT)
- To: Fred.Bone@dial.pipex.com, html-tidy@w3.org
"Fred Bone" <Fred.Bone@dial.pipex.com> wrote (about element breaking): I'm no expert on XML, but as far as I can tell this would only be a change of content if you have elem1 defined with xml:space='preserve', and then only the blanks would be significant (and not the newline). No, the newline would be significant too. The HTML4.1 spec says (appendix B.3.1): > SGML (see [ISO8879], section 7.6.1) specifies that a line break > immediately following a start tag must be ignored and this should, AIUI, also be true in XML (though I can't find anything in the XML spec corresponding to what I've quoted from the HTML one). That's because XML is different here. Basically, SGML is set up on the assumption that you will *ALWAYS* have the document type definition available, whereas XML is set up on the assumption that you will _seldom_ have a DTD or look at it if you have one. In fact, the XML specification is confused and confusing. It's a textbook example of a Bad Specification. (I'm planning to use it in a Formal Methods paper as a horrible example of what happens when you don't use formal methods.) Amongst other things, the semantics of spaces depends on whether you have a DTD (and your parser can handle it). Suppose you have <!ELEMENT elem1 (elem2+)> <!ELEMENT elem2 (#PCDATA)> and <elem1><elem2>foo</elem2><elem2>bar</elem2></elem1> Then an XML parser is going to come up with the events START "elem1" START "elem2" DATA "foo" END "elem2" START "elem2" DATA "bar" END "elem2" END "elem1". Now suppose you have <elem1> <elem2>foo</elem2> <elem2>bar</elem2> </elem1> IF your parser knows about the dtd AND bothers to process it, you get START "elem1" SPACE "\n " START "elem2" DATA "foo" END "elem2" SPACE "\n " START "elem2" DATA "bar" END "elem2" SPACE "\n" END "elem1". where a SPACE event says "here is a string of white space characters occurring where element content is required; feel free to ignore it." But if there isn't a dtd or if your parser can't handle it or doesn't feel like handling it because it's Tuesday, you'll get START "elem1" DATA "\n " START "elem2" DATA "foo" END "elem2" DATA "\n " START "elem2" DATA "bar" END "elem2" DATA "\n" END "elem1". Various XML specifications call for various ways around this, involving more or less successful heuristics for deleting white space DATA. I have often wondered why XML textbooks are so fond of indenting XML examples when it is such a bad thing for the semantics.
Received on Sunday, 13 October 2002 19:00:37 UTC