W3C home > Mailing lists > Public > www-voice@w3.org > January to March 2006

Meaning of the words "grapheme" and "orthography"

From: Mark Alexandre <markalexandre@gmail.com>
Date: Wed, 8 Feb 2006 06:05:08 -0600
Message-ID: <903631f00602080405k568ba97elc399754a41ca2c33@mail.gmail.com>
To: www-voice@w3.org
In the Pronunciation Lexicon Specification (PLS) v1.0, as of Draft 31,
the usage of the element tag <grapheme> reflects a misunderstanding
of the meaning of the word "grapheme."  The definition in the spec's
Glossary of Terms is nevertheless quite accurate: "One of the set of the
smallest units of a written language, such as letters, ideograms, or
symbols, that distinguish one word from another; a representation of
a single orthographic element."

Thus, the letter "g" and the numeral "4" are both examples of widely
used graphemes, as are the question mark "?" and the dollar sign "$".
In the current draft of the PLS however, the so-called "grapheme" element
is mistakenly applied to what in English is commonly called the "spelling."
I believe this will lead to confusion unless the element is renamed.
As to what it should be named, more on that below.

Before that, however, I wish to draw attention to an attribute of the
grapheme element, the "orthography" attribute.  This value of this
attribute is, according to the draft spec, supposed to be a "script code"
compliant with the ISO 15924 standard.  The title of that standard is
"Information and documentation — Codes for the representation of
names of scripts."  All of this naturally leads to the question: why not
name this attribute "script" or "scriptcode"?

The word "orthography", derived from Greek roots meaning (roughly)
"correct" and "writing", can present some ambiguity between two
related meanings, but neither meaning is the same as "script."

Orthography can be used as a synonym for what most English
speakers more commonly call the "spelling" of a particular word.
Thus the examples "colour" and "color" are two different orthographies
for the same word in the English language — the latter being the
American orthography that was adopted following a set of spelling
reforms which the rest of the English-speaking world declined to
follow.  The corresponding word in the French language is written
with the orthography "couleur."

In a related sense, the word "orthography" can be used to refer to
an entire system of conventions for writing, including such issues
as spelling and punctuation, plus even such trivia as the direction
of writing (such as left-to-right or vice versa), the spacing and/or
divisions of words, etc.  Some may also comprehend the word in
this broader sense to include issues of penmanship or calligraphy,
that is, the correct method to compose or draw the graphemes (or
characters, or symbols, or glyphs, if you like) of the language.
Note that, in this sense, conventions of orthography can differ even
between cultures that use the same alphabet — even between
the style guides used by differing editorial staffs in the same
metropolis!

Neither meaning of "orthography" is to be conflated with the meaning
of the word "script."  The Greek, Latin and Cyrillic alphabets, as well
as ancient cuneiform, Egyptian hieroglyphics, Chinese characters
(Hanzi, or Kanji in Japan), etc., are all most precisely referred to as
scripts, collectively.  Perhaps the only alternative to "script" is the
much more vague and expansive term, "writing system."

Finally, then, it would seem clear that the weight of the evidence
clearly argues for the attribute in question to be called "script" (or
"scriptcode" to be verbose), as indeed it is called in ISO 15924.
Having thus liberated the word "orthography" from misapplication,
we may consider that word a candidate for the element incorrectly
labelled "grapheme."

In addition to "orthography," other candidates for the element now
called grapheme in the draft spec might include "spelling" or "writing"
or the almost comically long-winded "graphic presentation form."
Any one of these four terms would be vastly preferable to "grapheme,"
—which, again, is simply wrong—but each does have certain short-
comings as well.  I will briefly list the problems I am aware of.

The term "orthography" is almost ideal, except for its unavoidable
connotation of "correctness."  That is, were it ever desirable, for
whatever reason, to list what may be deemed a "non-standard"
written form of a word, then calling that an orthography for the word
is misleading.  On the other hand, obviously, if the PLS is specifically
only intended to associate pronunciation with "correct" spellings
(according to somebody's criteria of correctness), then orthography
(in its narrower sense, vide supra) would be precisely accurate.
An additional bonus is that this word is understood with pretty much
the same meaning in other languages such as French and Spanish.

The term "spelling" is by far the more commonly used word by English
speakers when referring to how to write out a particular word.  Further-
more, it carries no connotation of correctness, since you can easily
refer to "alternate spelling" or even "bad spelling."  The downside,
a minor one, is that the notion of spelling is strongly associated with
alphabets; it is not at all clear what spelling means in the context of
Chinese writing or similar non-alphabetic systems.

The term "writing" is just vague enough to mean anything you want.
Since it applies to every aspect of, well, writing, it could be applied to
any aspect of it.  Put another way, its upside is its downside.

Finally, as for "graphic presentation form," or something similarly long
and comically precise: one is tempted to wonder whether every XML
parser out there really can handle sentence-length element names,
as well as how many folks have access to an XML editor with a
contextual auto-completion feature!

Just to throw out one last (off-the-wall) possibility, consider that the
Spanish cognate of the English word orthography is "ortografía",
which commonly gets shortened to just "grafía."  [See for example,
http://www.xtec.es/~faguile1/grafia/ ].  This suggests that this originally
Greek root for "writing" suffices all by itself to communicate the idea
we are talking about here.  Perhaps an Anglicized (or Anglicised, if
you prefer) coining such as "graphy" or a more internationally flavored
"graphia" or "graphie" would actually be the least open to misuse
and misinterpretation, since—to paraphrase Humpty-Dumpty—
it would mean just what we chose it to mean.
Received on Wednesday, 8 February 2006 13:59:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 October 2006 12:49:01 GMT