Re: New FAQ: entities and NCRs

Hi Richard, hi all,

This is a summary of the issue "NCRs and schema languages", which has some  
overlap with the FAW "entities and NCRs".  It describes ways of  
encapsulating NRCs, and entities are ONE possibility. Which way is useful  
and possible, depends on the schema language. This summary might become an  
input to the FAQ about entites and NCRs, if Richard and others think it is  
useful.

Cheers, Felix.

The following discussion on entities for numeric character references  
(NCRs) and other, alternative ways of encapsulating numeric character  
references concentrates on four schema languages: XML DTDs, XML Schema,  
RELAX NG and Schematron.

All schema languages allow to use entities for NCRs in XML documents. They  
differ with respect to the declaration of entities. As for XML DTDs,  
entities can be defined A) in the declaration subset of the XML document,  
or B) in the external DTD ("NCR" is used as a placeholder for a numeric  
character reference):

A)
<!DOCTYPE mydoc [
<!ENTITY mychar "NCR">
]>

or

B)
<!DOCTYPE mydoc SYSTEM "mydtd.dtd">

"mydtd.dtd" contains The entitiy declaration <!ENTITY mychar "NCR">.

XML Schema, RELAX NG and Schematron allow to declare entities like A).  
They do not allow to declare entities like B), i.e. as part of the  
external schema. Strictly speaking, entity declaration and expansion are  
out of scope for XML Schema, RELAX NG and Schematron. All these schema  
languages rely on an XML processor which expands the entities before the  
validation against the schema starts. Non-validating XML processors are  
required to check only the document and no external declarations. Hence,  
it depends on the implementation of the XML processor, whether external  
entity declarations can be resolved or not.
XML Schema provides a different solution to encapsulate numeric character  
references: The numeric character reference can be defined as a default  
value for an element:

<xsd:element name="mychar" type="xsd:token" fixed="NCR"/>

In an XML document, the element then can be used like this

<mydoc> ... <mychar/>...</mydoc>

RELAX NG and XML DTDs do not allow to define default values for element  
content. Also, Schematron does not support this solution. But XML DTDs,  
XML Schema and RELAX NG allow to declare default values for attributes.  
Hence, for XML DTDs the following alternative way of attaching a name to a  
numeric character reference is possible:

<!ELEMENT mychar EMPTY>
<!ATTLIST mychar ncr NMTOKEN "..." #FIXED>

or in RELAX NG:
<element name="mychar">
  <attribute name="ncr" a:defaultValue="NCR"/>
  <empty/>
</element>

or in XML Schema:
<xsd:attribute name="mychar" type="xsd:token" fixed="NCR"/>

As for XML DTDs, there seems to be no real need to choose this method,  
since they allow to declare entities in the external DTD.

The following table summarizes the ways of declaring entities and  
alternative methods to encapsulate numeric character references in  
different schema languages.

      Declaration Subset    External Subset    Element default  
value Attribute default value
XML DTDs     +            +            -          +
XML Schema     +            xml parser dep.    +          +
RELAX NG     +            xml parser dep.    -          +
Schematron     +            xml parser dep.    -          -

Received on Wednesday, 6 July 2005 05:39:58 UTC