W3C home > Mailing lists > Public > www-dom@w3.org > April to June 2000

Whitespace handling

From: Eric Richardson <maxwell@telesoft.com>
Date: Wed, 31 May 2000 07:51:16 -0700
Message-ID: <393526E4.6F91CD2E@telesoft.com>
To: DOM <www-dom@w3.org>
Hi,
I posted this on comp.text.xml with no response so I was wondering if
anyone could help me on this list.

I recently switched to JAXP 1.0 from xml-tr2.
I am using DTDs and asking for strict parsing something like this.

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
 dbf.setValidating(true);
 DocumentBuilder db = dbf.newDocumentBuilder();
 Document doc = db.parse(uri);

My input file is human readable with whitespace like this.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE doc SYSTEM 'doc.dtd'>
<doc>
  <element1>first</element1>
  <element2>second</element2>
</doc>


My dtd is like this.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!ELEMENT doc (element1, element2)>
<!ELEMENT element1 (#PCDATA)>
<!ELEMENT element2 (#PCDATA)>

I'm getting empty TEXT nodes between the elements. Although I haven't
checked , I would guess they are line feeds. I read the XML spec and it
says that the parser should pass line feeds to the application which I
believe in this case should be the DOM. In the DOM, I would hope that no
TEXT nodes would be created in between ELEMENTS unless I specified a
mixed content model with PCDATA. Is there something I don't understand
about the encoding or something that would cause this?

What do I need to do to avoid this?

Thanks,
Eric :-)
Received on Wednesday, 31 May 2000 10:55:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 22 June 2012 06:13:47 GMT