RE: New FAQ: entities and NCRs

Hi Felix,

Many thanks for doing this.

A couple of comments.

Entities are not really alternatives to NCRs, but alternatives to character codes.  Although using an NCR may be convenient to express the character in question, there is no reason that I know of that you have to use an NCR in all the places you indicate, other than where the document encoding disallows the character.  

So I would make changes of the following type: s/All schema languages allow to use entities for NCRs in XML documents./All schema languages allow the use of entities instead of characters in XML documents./

It seems to me that all the approaches you describe other than defining entities in the document subset fall foul of the problem that they will not be recognised if the implementation doesn't retrieve the external file, right?

With a little work we could publish this as an FAQ with a question like: "How do different schemas allow me to define character entities?"  Do you want/have time to?  If so, we should discuss it and set up a wiki page.

RI

============
Richard Ishida
W3C

contact info:
http://www.w3.org/People/Ishida/ 

W3C Internationalization:
http://www.w3.org/International/ 

Publication blog:
http://people.w3.org/rishida/blog/
 
 

> -----Original Message-----
> From: Felix Sasaki [mailto:fsasaki@w3.org] 
> Sent: 06 July 2005 06:40
> To: public-i18n-geo@w3.org
> Cc: Richard Ishida
> Subject: Re: New FAQ: entities and NCRs
> 
> Hi Richard, hi all,
> 
> This is a summary of the issue "NCRs and schema languages", 
> which has some  
> overlap with the FAW "entities and NCRs".  It describes ways of  
> encapsulating NRCs, and entities are ONE possibility. Which 
> way is useful  
> and possible, depends on the schema language. This summary 
> might become an  
> input to the FAQ about entites and NCRs, if Richard and 
> others think it is  
> useful.
> 
> Cheers, Felix.
> 
> The following discussion on entities for numeric character 
> references  
> (NCRs) and other, alternative ways of encapsulating numeric 
> character  
> references concentrates on four schema languages: XML DTDs, 
> XML Schema,  
> RELAX NG and Schematron.
> 
> All schema languages allow to use entities for NCRs in XML 
> documents. They  
> differ with respect to the declaration of entities. As for XML DTDs,  
> entities can be defined A) in the declaration subset of the 
> XML document,  
> or B) in the external DTD ("NCR" is used as a placeholder for 
> a numeric  
> character reference):
> 
> A)
> <!DOCTYPE mydoc [
> <!ENTITY mychar "NCR">
> ]>
> 
> or
> 
> B)
> <!DOCTYPE mydoc SYSTEM "mydtd.dtd">
> 
> "mydtd.dtd" contains The entitiy declaration <!ENTITY mychar "NCR">.
> 
> XML Schema, RELAX NG and Schematron allow to declare entities 
> like A).  
> They do not allow to declare entities like B), i.e. as part of the  
> external schema. Strictly speaking, entity declaration and 
> expansion are  
> out of scope for XML Schema, RELAX NG and Schematron. All 
> these schema  
> languages rely on an XML processor which expands the entities 
> before the  
> validation against the schema starts. Non-validating XML 
> processors are  
> required to check only the document and no external 
> declarations. Hence,  
> it depends on the implementation of the XML processor, 
> whether external  
> entity declarations can be resolved or not.
> XML Schema provides a different solution to encapsulate 
> numeric character  
> references: The numeric character reference can be defined as 
> a default  
> value for an element:
> 
> <xsd:element name="mychar" type="xsd:token" fixed="NCR"/>
> 
> In an XML document, the element then can be used like this
> 
> <mydoc> ... <mychar/>...</mydoc>
> 
> RELAX NG and XML DTDs do not allow to define default values 
> for element  
> content. Also, Schematron does not support this solution. But 
> XML DTDs,  
> XML Schema and RELAX NG allow to declare default values for 
> attributes.  
> Hence, for XML DTDs the following alternative way of 
> attaching a name to a  
> numeric character reference is possible:
> 
> <!ELEMENT mychar EMPTY>
> <!ATTLIST mychar ncr NMTOKEN "..." #FIXED>
> 
> or in RELAX NG:
> <element name="mychar">
>   <attribute name="ncr" a:defaultValue="NCR"/>
>   <empty/>
> </element>
> 
> or in XML Schema:
> <xsd:attribute name="mychar" type="xsd:token" fixed="NCR"/>
> 
> As for XML DTDs, there seems to be no real need to choose 
> this method,  
> since they allow to declare entities in the external DTD.
> 
> The following table summarizes the ways of declaring entities and  
> alternative methods to encapsulate numeric character references in  
> different schema languages.
> 
>       Declaration Subset    External Subset    
> Element default  
> value Attribute default value
> XML DTDs     +            +  
>           -          +
> XML Schema     +            xml parser 
> dep.    +          +
> RELAX NG     +            xml parser 
> dep.    -          +
> Schematron     +            xml parser 
> dep.    -          -
> 

Received on Friday, 8 July 2005 12:46:41 UTC