Whitespace handling

Hi,
I posted this on comp.text.xml with no response so I was wondering if
anyone could help me on this list.

I recently switched to JAXP 1.0 from xml-tr2.
I am using DTDs and asking for strict parsing something like this.

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
 dbf.setValidating(true);
 DocumentBuilder db = dbf.newDocumentBuilder();
 Document doc = db.parse(uri);

My input file is human readable with whitespace like this.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE doc SYSTEM 'doc.dtd'>
<doc>
  <element1>first</element1>
  <element2>second</element2>
</doc>


My dtd is like this.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!ELEMENT doc (element1, element2)>
<!ELEMENT element1 (#PCDATA)>
<!ELEMENT element2 (#PCDATA)>

I'm getting empty TEXT nodes between the elements. Although I haven't
checked , I would guess they are line feeds. I read the XML spec and it
says that the parser should pass line feeds to the application which I
believe in this case should be the DOM. In the DOM, I would hope that no
TEXT nodes would be created in between ELEMENTS unless I specified a
mixed content model with PCDATA. Is there something I don't understand
about the encoding or something that would cause this?

What do I need to do to avoid this?

Thanks,
Eric :-)

Received on Wednesday, 31 May 2000 10:55:02 UTC