W3C home > Mailing lists > Public > www-dom@w3.org > April to June 2000

Whitespace handling

From: Eric Richardson <maxwell@telesoft.com>
Date: Wed, 31 May 2000 07:51:16 -0700
Message-ID: <393526E4.6F91CD2E@telesoft.com>
To: DOM <www-dom@w3.org>
I posted this on comp.text.xml with no response so I was wondering if
anyone could help me on this list.

I recently switched to JAXP 1.0 from xml-tr2.
I am using DTDs and asking for strict parsing something like this.

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
 DocumentBuilder db = dbf.newDocumentBuilder();
 Document doc = db.parse(uri);

My input file is human readable with whitespace like this.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE doc SYSTEM 'doc.dtd'>

My dtd is like this.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!ELEMENT doc (element1, element2)>
<!ELEMENT element1 (#PCDATA)>
<!ELEMENT element2 (#PCDATA)>

I'm getting empty TEXT nodes between the elements. Although I haven't
checked , I would guess they are line feeds. I read the XML spec and it
says that the parser should pass line feeds to the application which I
believe in this case should be the DOM. In the DOM, I would hope that no
TEXT nodes would be created in between ELEMENTS unless I specified a
mixed content model with PCDATA. Is there something I don't understand
about the encoding or something that would cause this?

What do I need to do to avoid this?

Eric :-)
Received on Wednesday, 31 May 2000 10:55:02 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 20 October 2015 10:46:07 UTC