W3C home > Mailing lists > Public > public-i18n-its@w3.org > January to March 2005

Re: draft of requirement related to entities

From: Yves Savourel <ysavourel@translate.com>
Date: Tue, 29 Mar 2005 09:13:35 -0700
To: <public-i18n-its@w3.org>
Message-ID: <HYDRApfW6zZez2oAyfv0000e718@hydra.RWS.LOCAL>
Hi all,

Here is an updated version of the entity requirement.

It still does not have anthing with regard to character entity references vs. NCRs as Richard suggested. Maybe we also need
something about the predefined character entities (&apos;, etc.)?

==============================
Challenge/Issue: 

XML applications (i.e. a combination of DTD/XSD, stylesheets, XML instances) often are subdivided into physical units called
entities (see http://www.xml.com/axml/target.html#sec-physical-struct). Various types of entities exist (see
http://tech.irt.org/articles/js212/#intro).

Examples: 

1- A character entity. The entity defines a single Unicode character. 

Example: <!ENTITY aacute "&#225;" > 

2- A short element-free text. The entity defines a short text that contains only text (no element or other XML constructs). This is
for instance an entity for a product name.

Example: <!ENTITY ProductName "PictoMagic for Windows" > 

3- A longer text with one or more elements. The entity defines a piece of boiler-plate text such as a copyright paragraph.

Example: <!ENTITY CopyrightInfo "<a href='\copyright.htm'>Copyright</a> 2005 W3C."> 

Two aspects of entities are of particular importance with regard to internationalization and localization: 

1.      entities are defined 
2.      entities are used 

For example, the snippet 

        <!ENTITY ProductName "PictoMagic for Windows" > 

defines an entity called 'ProductName', and the snippet 
  
        The latest version of &ProductName; features many enhancements. 

references/uses the entity. 

If internationalization and localization are not addressed for entity-related work several issues may arise: 

1.      Entity reference cannot be resolved 

Example: the definition is not available to the XML processor 

2.      Entity definition does not fit with the surrounding context language-wise 

Example: The context in 'Das Produkt &ProductName; ist mit vielen Erweiterungen ausgestattet worden' is German whereas the
definition may be in English

3.      Entity definition does not fit with the surrounding context grammar-wise 

Example: The syntax in 'The latest version of &ProductName; features many enhancements' may be incorrect if the definition
designates an object in plural.

In addition, even if the entity itself is translated there may be significant grammatical problems for inflected languages for
nouns. The translation will inevitably follow the case of the original. For example, if the original is genitive, the translation is
genitive as well (of course this requires that the original language and the translation language have a concept for 'genitive').

Since entities affect the content of the document, and XSLT processors and other kinds of XML processors act on the content, various
processing-related issues may arise. An XSLT stylesheet for example, which is sensitive to content contributed by an entity, may
fail to work as expected (e.g. may not be able to generate the 'alt' for HTML pages).

Notes: 

Ideally, the solution which the WG will produce will be applicable not only with regard to entities but also in the realm of
XInclude (see http://www.w3.org/TR/xinclude/) or even fragments (see http://www.w3.org/TR/2001/CR-xml-fragment-20010212#packaging).

Quick Guideline Thoughts: 

1. If possible, XML applications should avoid the use of entities.

2. XML applications which have to make use of entities have to be build in such a way that entities can be localized easily (ie. the
XML application has to be internationalized wrt. entities).

3. If entities are used, the XML instances should be declared as 'standalone' (see http://www.w3.org/TR/REC-xml/#sec-rmd).

==============================

Cheers,
-Christian & Yves
Received on Tuesday, 29 March 2005 16:14:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:44 GMT