W3C home > Mailing lists > Public > xmlschema-dev@w3.org > June 2003

RE: [xml-dev] CDATA processing problem using Xerces-J validating parser

From: Tony Opatha <tonyopatha@yahoo.com>
Date: Thu, 19 Jun 2003 11:12:34 -0700 (PDT)
Message-ID: <20030619181234.5881.qmail@web20506.mail.yahoo.com>
To: Dare Obasanjo <dareo@microsoft.com>, xml-dev@lists.xml.org
Cc: xmlschema-dev@w3c.org
The XML document instance fragment that contains the problematic part of
the CDATA section is:
 

<Description><![CDATA[

[I'm excluding here many Javascripts and other parts...]

<p><FONT SIZE=5><A HREF="http://www.xyz.com/danl.htm">32.COM Records<A>

[I'm excluding here many other CDATA section parts...]

<]]></Description>

 
The XSD schema fragment for the CDATA section contained in the element is:

<xs:element name="Description" type="xs:string"/>

 

The error is get at parsing/validating time is:

Error unmarshalling instance: org.xml.sax.SAXException: Data not belonging to any element encountered: 32.com Results 

 

Thanks.



Dare Obasanjo <dareo@microsoft.com> wrote:
Like clockwork, whenever someone posts a complaint about how the parser isn't skipping the content of CDATA sections they always back up the assertion with the link to the erroneous content at http://www.w3schools.com/xml/xml_cdata.asp 

Can you provide a fragment of your schema and instance document so we can tell what you are trying to do. 

________________________________

From: Tony Opatha [mailto:tonyopatha@yahoo.com]
Sent: Thu 6/19/2003 10:40 AM
To: Dare Obasanjo; xml-dev@lists.xml.org
Cc: xmlschema-dev@w3c.org
Subject: RE: [xml-dev] CDATA processing problem using Xerces-J validating parser


Yes, it is the case that the validating parser is processing CDATA and
can not determine which element the data belongs to???


CDATA section seems to be ignored by XML Spy while same XSD and same
XML instance seems to be processed and deemed not valid (see error in 
attached e-mail below). Following note seems to indicate the misconception
that everything in CDATA is ignored by the parser:

http://www.w3schools.com/xml/xml_cdata.asp


So, is there a way we can "play" around XSD data type specification for
the element that contains the CDATA. It would be quite difficult to "escape"
all characters in the CDATA section to satisfy the xerces parser.

Any ideas how to workaround this problem.

Thanks for you help.



Dare Obasanjo wrote:

It is a common misconception that information within a CDATA section is not processed by the XML parser but instead is skipped. Unfortunately this is incorrect. CDATA sections are at best a shorthand mechanism that prevents having to escape certain characters not a directive to the XML parser to halt processing until further notice. 

________________________________

From: Tony Opatha [mailto:tonyopatha@yahoo.com]
Sent: Wed 6/18/2003 9:54 PM
To: xml-dev@lists.xml.org
Subject: [xml-dev] CDATA processing problem using Xerces-J validating parser


I have a XML doc instance that contains CDATA section:
""

Parsers validating this XML are supposed to ignore the
CDATA. XML Spy validates the XML instance fine using
its corresponding XSD which defines the CDATA as
a xsd:string.

Now when I use in run-time Xeres-J parser it seems to
be processing the CDATA and obviously fails to
validate the CDATA section since it has all types of
illegal characters in it:

Here is the error I get:

org.xml.sax.SAXException: Data not belonging to any element encountered: 32.com Record

Any ideas why this is a problem.

I believe this may be a Xerces 1.4 parser. Does xerces support
correct handling of CDATA???

thanks




________________________________

Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

________________________________

Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!

---------------------------------
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
Received on Thursday, 19 June 2003 14:12:36 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:15:11 UTC