W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > October 1996

Re: ERB decisions on A.17, B.9, and other questions

From: <lee@sq.com>
Date: Sat, 19 Oct 96 19:07:31 EDT
Message-Id: <9610192307.AA13625@sqrex.sq.com>
To: U35395@UICVM.UIC.EDU, w3c-sgml-wg@w3.org
Cc: DGD@cs.bu.edu
> Would
>     <!ENTITY a.teng SDATA "[a.teng  ]">
>     <!ENTITY a.teng SDATA "[U+5B8A]">
> be (a) equally likely to be understood by all XML processors and (b)
> materially less likely to elicit the reaction &expletive;?

For me, only the 2nd of these makes any sense at all.

> If the ERB decides question C.5 the way I hope we will, then we can
> at least say
>     <!ENTITY auml   "&u00E4;"><!-- auml = a umlaut (dec 228) -->
>     <!ENTITY a.teng "&u5B8A;"><!-- a.teng = Tengwar vowel A
>                                    (decimal 23434) -->

Please let's not require the use of "significant comments".
If the parser can't throw away comments, they are not comments.

There are several pieces of information that one needs to know
about a glyph, at various times.
One could write a little document about it:

    The pre-defined standard ISO10646 code point, if one exists
    A suggested typeface to use for rendering the symbol
	(e.g. Lucida Mathematics Bold Extensions 13)
    The name or position within that face
	(e.g. LeftBraceExtender, or 0xF7)
    A human-readable description of the glyph, for the purposes of
    * people editing documents referring toit
    * editing software (the Insert Elephant menu)
    How to sort the glyph (unless it is forbidden to use XML for
      telephone directories, dictionaries, indexes, bibliographies...)

Some glyphs are made up by overstriking or composing several others,
in which case (as a minumum) it must be possible to say
    1. sequence: use this glyph followed by this glyph
    2. overstrike: use this glyph and this glyph
Optionally, one could choose to represent piles:
    3. verticality: use this glyph over this glyph 
This lets you build up diacriticals, e.g. for mathematical discourse,
but is not essential.
    (in maths, "a acute bar" is an a with an acute accent and then with
     a macron accent or bar placed above the acute accent.  The bar must
     be moved up to avoid overstriking the acute accent.  This sort of
     positioning is also needed for Ancient Greek and Hebrew, for
     example, neither of which I consider to be very obscure...)

If only we had a language for representing such short documents.

Note that few applications use all of this information; it does not
add significant complexity, but it _does_ add significant value.

See reference [1] for some further discussion of these issues (Michael,
Lou, I know you're familiar with this already).

Neither Unicode nor ISO10646 removes the need for user-defined glyphs.
They enable one to insert glyphs into the encoding, just as today on
some systems I can put a meta-control-a in my file and hey, look!,
on my screen it's an OE ligature.  On yours it's a picture of Bill Gates,
but who's to know?

Again, I don't care about the word SDATA, but I do care about losing
the representation of meaning.

Would you be happy with
    <!Element 391 - - (%4067;)* -- paragraph -->

It is bad enough that there is no standard place to put an element
or entity description in SGML, so that we are faced with crypticisms
like e4 or %m.pz.x; with no guidelines when we're building a style sheet.


[1] Harry Gaylord, _Character Representation_ in ``Text Encoding Initiative:
Background and Context'', Ed. Nancy Id and Jean Véronis, Kluwer, 1995
Received on Saturday, 19 October 1996 19:09:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:04 UTC