Re: HTML entities.

Many thanks to Vincent for his detailed explanation on the character reference issue and Amaya behaviour.

I now understand that character representation is used for those characters that are not included in the chosen Character Encoding (e.g. us-ascii, iso-8859-1). Depending on the chosen DTD, character representation is made by either a Numeric character reference or a Character entity reference (the entity must be specified in the DTD).

Due to a bug, which is about to be fixed, Amaya do not always/correctly convert these characters.

My wish is that Amaya will use Numeric character references, because that seems to be the most reliable.

Thanks,
Thomas Jedenfelt

P.S.
I have written a page on this subject. It seems that there are but 78 characters that are safely transferred over the Internet and displayed on various computers.

For those with an interest, I welcome corrections and comments.

http://hem.bredband.net/thojed/webtech/char_representation.htm


----- Original Message -----
From: "Vincent Quint" <Vincent.Quint@inrialpes.fr>
Date: Fri, 25 Nov 2005
> 
> Let me try to clarify how Amaya encodes characters when saving
> a document. First, a short reminder about encoding and XML may
> be useful.
> 
> us-ascii is a 7-bit code that represents 96 printable characters
> (positions 32 through127 decimal).
> 
> iso-8859-1 is a 8-bit code that represents the same characters as
> us-ascii, at the same positions, plus 95 additional printable characters.
> 
> utf-8 is a variable length encoding scheme for the Universal Character
> Set - UCS (ISO10646 aka Unicode). UCS represents thousands of
> characters. Note that the 96 characters from us-ascii have the same
> positions in UCS as in us-ascii (and then as in iso-8859-1).
> 
> If an XML document contains a character that is not available in the
> character set (charset) available with the encoding, a special
> representation is required that uses only on the characters available
> with this charset. XML offers two such representations of characters:
> 
> 1 - Character references represent the position (in decimal or hexadecimal)
>      of the character in the UCS. For example, the greek letter alpha is
>      represented as
>        & #x3b1;   (hexadecimal) or
>        & #945;     (decimal)
> 
> 2 - Entity references use a name to represent a content (in that case
>      a character).  For example the greek letter alpha may be represented as
>        & alpha;
> 
> Character references may be used in any XML document, but entity references
> are allowed only if the document itself contains a means to resolve the name.
> Entity name resolution is provided by the Document type definition which
> refers to the DTD where names and their associated contents are defined.
> Practically, this means that you can use entity references only if the
> <!DOCTYPE ...> is present and refers to a DTD that defines the names you use.
> 
> With that in mind, it is easier to understand how Amaya works.
> 
> By default, Amaya preserves the initial encoding of the document,
> that is the encoding that was associated with the document at loading
> time. You can check this encoding with command
> File/Document_information (Charset field). The Save command saves
> the document with that encoding, while the Save_As command allows you
> to choose another encoding (Charset field).
> 
> When saving a document (Save or Save_As commands), all characters
> that are available in the charset of the encoding are just written using the
> encoding. Only the other characters are written using character or entity
> references. The choice between these two options is made according to
> the DOCTYPE. If there is a DOCTYPE that refers to a DTD that defines a
> name for the character, an entity reference is used (i.e. a name), otherwise
> Amaya generates a character reference in hexadecimal.
> 
> Note that command File/Change_Document_Type allows you to associate,
> to change or to remove the DOCTYPE of a document at any time. This
> allows you to generate either character references or entity references.
> 
> This is the basic principle, but it seems that a few bugs make Amaya behave
> a bit differently. Irène has already fixed a bug recently. She is now checking
> the whole process thoroughly.
> 
> Vincent.
> --------------
> Vincent Quint                       INRIA Rhône-Alpes
> INRIA                               ZIRST
> 
> 
> On Fri, 25 Nov 2005
> "Thomas Jedenfelt" wrote:
> >
> > It seems to me that Amaya 8.8.1 generates Hexadecimal entities 
> > for [Symbols] and [Internationalization] characters using Charset 
> > ISO-8859-1, UTF-8 and US-ASCII.
> >
> > Amaya does not, it seems, generate numeric or name entities for 
> > [ISO 8859-1 characters], code position 161 through 255, using 
> > Charset US-ASCII. (Will be fixed in next Amaya release, according 
> > to Irène/INRIA, 24 Nov.)
> >
> > (As previously mentioned, when using Charset ISO-8859-1 it is my 
> > Wish that Amaya will generate Decimal character references for 
> > characters of code position 161 through 8364, for the purpose of 
> > accurately transferring HTML-files over the Internet.)
> >
> > Reference:
> >
> > [ISO 8859-1 characters]
> > http://www.w3.org/TR/html401/sgml/entities.html#h-24.2.1
> >
> > [Symbols]
> > http://www.w3.org/TR/html401/sgml/entities.html#h-24.3.1
> >
> > [Internationalization] (i18n)
> > http://www.w3.org/TR/html401/sgml/entities.html#h-24.4.1
> >
> > Regards,
> > Thomas Jedenfelt
> >
> >
> > ----- Original Message -----
> > From: "Peter Kerr"
> > Date: Fri, 25 Nov 2005
> > > > I've followed this subject with interest as our own institution
> > > comes to grips with MS Code Pages  :-(
> > > > If I understand correctly what is happening,
> > > > Amaya will use named entities for US-ASCII charset,
> > > hex entities for ISO-8859, and
> > > numerical entities for UTF-8 or UTF-16
> > > > Perhaps this should be made clear in the documentation,
> > > that the charset and/or encoding declaration defines the entity format.
> > > That seems logical enough behaviour to me.
> > > > > Peter Kerr                       Mail.app 2.0.3 (734)
> > > Snr Technician
> > > School of Music,  University of Auckland,  New Zealand
                                France


-- 
_______________________________________________
Surf the Web in a faster, safer and easier way:
Download Opera 8 at http://www.opera.com

Powered by Outblaze

Received on Friday, 2 December 2005 09:34:54 UTC