- From: Daniel W. Connolly <connolly@hal.com>
- Date: Mon, 12 Dec 1994 12:47:40 -0600
- To: Dave Raggett <dsr@hplb.hpl.hp.com>
- Cc: www-html@www0.cern.ch
In message <9412121829.AA11308@dragget.hpl.hp.com>, Dave Raggett writes: >Dan, > >Thanks for checking the details. > >I am still uncertain about how best to handle the latin-1 entities. >I changed the name from %ISOlat1 to %HTMLlat1 following a suggestion >by Terry (or was it Paul?). I would expect this file to include entity >names for the Latin-1 character codes below 128 and hence would include >& and " etc. Why were these omitted from the 2.0 spec? Take care not to confuse the "Added Latin 1" entity set (from an appendix to the SGML spec, ISO8879) with the Latin 1 character set (defined by ISO-8859-1). & and " are not in the "Added Latin 1" entity set -- they're in the iso-num set ("ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN"). But the rest of iso-num isn't used in HTML, so the few definitions for amp, quot, lt, etc. are inlined in html.dtd. The Added Latin 1 entity set defines a bunch of names for Latin 1 characters. The SGML spec appendix that defines it makes no reference to the Latin 1 character set (ISO-8859-1). It maps those names to these thingies called SDATA entities -- system dependent data entities. I believe the intention is that the SDATA entities are supposed to be replaced on a per-SGML-system basis. So you might see TeX version of "ISO 8879-1986//ENTITIES Added Latin 1//EN", with: <!ENTITY eacute SDATA "\eacute" -- for TeX --> Since the document character set for HTML includes all the characters referred to by those names, there's no need to use system-specific mappings. The entities can be mapped to characters within the document character set. In response to the same feedback you saw, this set of definitions is now called: "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML" See: http://www.hal.com/%7Econnolly/html-spec/html-pubtext.html for details. Dan
Received on Monday, 12 December 1994 19:54:53 UTC