Re: New FAQ: entities and NCRs

On Fri, 08 Jul 2005 21:46:31 +0900, Richard Ishida <ishida@w3.org> wrote:

>
> Hi Felix,
>
> Many thanks for doing this.

np!

>
> A couple of comments.
>
> Entities are not really alternatives to NCRs, but alternatives to  
> character codes.  Although using an NCR may be convenient to express the  
> character in question, there is no reason that I know of that you have  
> to use an NCR in all the places you indicate, other than where the  
> document encoding disallows the character.>
> So I would make changes of the following type: s/All schema languages  
> allow to use entities for NCRs in XML documents./All schema languages  
> allow the use of entities instead of characters in XML documents./

Good!

>
> It seems to me that all the approaches you describe other than defining  
> entities in the document subset fall foul of the problem that they will  
> not be recognised if the implementation doesn't retrieve the external  
> file, right?

Yes. So maybe it's bad news for the schema languages :( , but it is also  
maybe something people would like to be made aware of.

>
> With a little work we could publish this as an FAQ with a question like:  
> "How do different schemas allow me to define character entities?"  Do  
> you want/have time to?  If so, we should discuss it and set up a wiki  
> page.

Good idea. Let's come back to it if you have time.


-- Felix

>
> RI
>
> ============
> Richard Ishida
> W3C
>
> contact info:
> http://www.w3.org/People/Ishida/
>
> W3C Internationalization:
> http://www.w3.org/International/
>
> Publication blog:
> http://people.w3.org/rishida/blog/
>
>> -----Original Message-----
>> From: Felix Sasaki [mailto:fsasaki@w3.org]
>> Sent: 06 July 2005 06:40
>> To: public-i18n-geo@w3.org
>> Cc: Richard Ishida
>> Subject: Re: New FAQ: entities and NCRs
>>
>> Hi Richard, hi all,
>>
>> This is a summary of the issue "NCRs and schema languages",
>> which has some
>> overlap with the FAW "entities and NCRs".  It describes ways of
>> encapsulating NRCs, and entities are ONE possibility. Which
>> way is useful
>> and possible, depends on the schema language. This summary
>> might become an
>> input to the FAQ about entites and NCRs, if Richard and
>> others think it is
>> useful.
>>
>> Cheers, Felix.
>>
>> The following discussion on entities for numeric character
>> references
>> (NCRs) and other, alternative ways of encapsulating numeric
>> character
>> references concentrates on four schema languages: XML DTDs,
>> XML Schema,
>> RELAX NG and Schematron.
>>
>> All schema languages allow to use entities for NCRs in XML
>> documents. They
>> differ with respect to the declaration of entities. As for XML DTDs,
>> entities can be defined A) in the declaration subset of the
>> XML document,
>> or B) in the external DTD ("NCR" is used as a placeholder for
>> a numeric
>> character reference):
>>
>> A)
>> <!DOCTYPE mydoc [
>> <!ENTITY mychar "NCR">
>> ]>
>>
>> or
>>
>> B)
>> <!DOCTYPE mydoc SYSTEM "mydtd.dtd">
>>
>> "mydtd.dtd" contains The entitiy declaration <!ENTITY mychar "NCR">.
>>
>> XML Schema, RELAX NG and Schematron allow to declare entities
>> like A).
>> They do not allow to declare entities like B), i.e. as part of the
>> external schema. Strictly speaking, entity declaration and
>> expansion are
>> out of scope for XML Schema, RELAX NG and Schematron. All
>> these schema
>> languages rely on an XML processor which expands the entities
>> before the
>> validation against the schema starts. Non-validating XML
>> processors are
>> required to check only the document and no external
>> declarations. Hence,
>> it depends on the implementation of the XML processor,
>> whether external
>> entity declarations can be resolved or not.
>> XML Schema provides a different solution to encapsulate
>> numeric character
>> references: The numeric character reference can be defined as
>> a default
>> value for an element:
>>
>> <xsd:element name="mychar" type="xsd:token" fixed="NCR"/>
>>
>> In an XML document, the element then can be used like this
>>
>> <mydoc> ... <mychar/>...</mydoc>
>>
>> RELAX NG and XML DTDs do not allow to define default values
>> for element
>> content. Also, Schematron does not support this solution. But
>> XML DTDs,
>> XML Schema and RELAX NG allow to declare default values for
>> attributes.
>> Hence, for XML DTDs the following alternative way of
>> attaching a name to a
>> numeric character reference is possible:
>>
>> <!ELEMENT mychar EMPTY>
>> <!ATTLIST mychar ncr NMTOKEN "..." #FIXED>
>>
>> or in RELAX NG:
>> <element name="mychar">
>>   <attribute name="ncr" a:defaultValue="NCR"/>
>>   <empty/>
>> </element>
>>
>> or in XML Schema:
>> <xsd:attribute name="mychar" type="xsd:token" fixed="NCR"/>
>>
>> As for XML DTDs, there seems to be no real need to choose
>> this method,
>> since they allow to declare entities in the external DTD.
>>
>> The following table summarizes the ways of declaring entities and
>> alternative methods to encapsulate numeric character references in
>> different schema languages.
>>
>>       Declaration Subset    External Subset
>> Element default
>> value Attribute default value
>> XML DTDs     +            +  
>>           -          +
>> XML Schema     +            xml parser
>> dep.    +          +
>> RELAX NG     +            xml parser
>> dep.    -          +
>> Schematron     +            xml parser
>> dep.    -          -
>>
>
>

Received on Saturday, 9 July 2005 03:19:07 UTC