Re: How to handle a DTD entity reference in XML-Schema

Eddie Robertsson <eddie@allette.com.au> writes:

> > > So, can anybody please tell me how to define Entity Reference in
> > > XML-schema?
> >
> > I'm afraid that you can't. XML Schema only deals with constraints on
> > the logical structure of an XML document, not its serialization (i.e.
> > how particular characters are represented). You have to use a DTD if
> > you want to use entities.
> 
> I could be worth mentioning though that since XML Schema work on the
> Infoset that if the Infoset is created with a validating XML parser using
> a DTD then XML Schema processing will be applied on that Infoset. In
> theory this means that schema processing will be applied to an Infoset
> where all entity references have been resolved, attribute defaults
> inserted etc. The reason I'm saying "in theory" is becuase I'm not sure
> how this is stated in the XML Schema specification

Ought to be 'in practice' too!  The XML Schema REC throughout
describes what is validated as 'an element information item' per the
Infoset REC, which in turn is clear that internal subset internal general
entity processing is _required_ of conformant processors.

> and I've noticed different behaviour in different processors. For
> example say we have the following XML Schema:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
>  <xs:element name="PhysicalAddress">
>   <xs:complexType>
>    <xs:sequence>
>     <xs:element name="Unit" type="xs:string"/>
>     <xs:element name="Street" type="xs:string"/>
>     <xs:element name="StreetNbr" type="xs:string"/>
>     <xs:element name="Suburb" type="xs:string"/>
>    </xs:sequence>
>   </xs:complexType>
>  </xs:element>
> </xs:schema>
> 
> Now,  instead of a fully resolved instance document you have the
> following:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE PhysicalAddress [
>  <!ENTITY street SYSTEM "street.xml">
> ]>
> <PhysicalAddress xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:noNamespaceSchemaLocation="PhysicalAddress.xsd">
>  <Unit>31</Unit>
>  &street;
>  <StreetNbr>149</StreetNbr>
>  <Suburb>Pyrmont</Suburb>
> </PhysicalAddress>
> 
> where street.xml is:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <Street>Street</Street>
> 
> Based on the theory this instance should validate against it's schema and
> indeed with both XML Spy and XSV the above is perfectly valid. However,
> if you run this through MSXML4 you will get errors. I asked Microsoft
> about this and apparently if there exist a DTD for an instance that will
> always take presedence over XML Schema validation which in this case is
> ignored (when I asked there was no plan to change this...). I'm not sure
> what the XML Schema spec says about this or if this should be part of the
> XML Schema spec but it's clear that different Schema validators will
> handle this differently.

I thought you were headed in a different direction -- it is open to a
conformant non-validating XML processor to simply _not include_ the
external general entity referenced by '&street;'.  It would be
possibly, although unhelpful, to use such a processor as the first
stage of a schema processor.

As for MSXML4 -- I agree with you, that's the wrong strategy, but the
REC just says what schema validation is, not when you should perform
it.

ht
-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2001, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/

Received on Tuesday, 20 November 2001 04:25:11 UTC