(From diacritics to) simple language and grammar from Charles McCathieNevile on 2004-09-17 (w3c-wai-gl@w3.org from July to September 2004)

From: Charles McCathieNevile <charles@w3.org>
Date: Fri, 17 Sep 2004 05:35:56 -0400 (EDT)
To: WAI GL <w3c-wai-gl@w3.org>
Message-ID: <Pine.LNX.4.55.0402061211280.25523@homer.w3.org>
Short version:

This short exchange seems to demonstrate the original point: Technology
cannot interpret some linguistic styles. Many people cannot interpret those
styles. Avoiding those styles will help some of these people.

There are some methods to test linguistic complexity. We can use these to
improve accessibility. This is not crazy.

quote from an old exchange:
>> are also affected by missing
>> diacritic marks.  All screen readers will  make mistakes, and will
>> pronounce the wrong word. This will occur more often then an incorrect
>> word pronunciation makes grammatical sense.
>
>That *sentence* doesn't.

Very much longer version:

Actually, it does. And I think it is an interesting example, so I will try to
develop it at some length.

Clause: "This will occur more often"
  subject: "this", verb: "will occur",
  temporal adverbial phrase "more often"
    (adverb of degree: "more", temporal adverb "often")

conjunction: "then"

clause: "an incorrect word pronunciation makes grammatical sense"
  compound subject: "an incorrect word pronunciation"
    (indefinite article: "an", adjective: "incorrect",
    nominal phrase: "word pronunciation")
  verb: "makes" (a legitimate future form in english which has no native
    future tense)
  compound object: "grammatical sense"
    (adjective: "grammatical", noun: "sense")

Let's look at some interesting aspects:

0. Grammatical style
1. Ambiguity
2. The difference between syntax and semantics
3. Potential solutions

==Grammatical Style

There is perhaps a mismatch in tense between "will occur" (future) and
"makes" (a legitimate form for a near-term future, since english lacks a true
native form. But becoming uncommon). In languages with more formal grammar
this might be an error. In english, which has very little grammar inherent in
the language, and no formal body to define usage (in contrast to frnech,
spanish, icelandic, and other languages) this is clearly a style
question.

==Ambiguity

By breaking it into two short, simple sentences, even babelfish can mostly
parse it, giving useful results in french, spanish, and italian, portuguese
and german (I don't feel confident in interpreting its other results).

The text I gave bablefish was "This will occur more often. Then an incorrect
word pronunciation makes grammatical sense". In french it gave me "Ceci se
produira plus souvent. Alors une prononciation incorrecte de mot semble
raisonnable grammatical" - it should be "grammaticalement", and parsing the
adjective/noun combination is one of the most basic patterns in english.
(Certainly many children learn it before they learn noun "is" adjective).

If you put the french result into Babelfish, and ask for english, it gives
"This will more often occur. Then an incorrect pronunciation of word seems
reasonable grammatical".

==Syntax and semantics

Syntax, roughly speaking, is the particular words or structure you use to say
something, and semantics is what you mean. (In HTML the syntax is well
defined and machine-processable - it is what the DTD or schema says about the
elements and attributes and how they can be combined. DTDs are more or less
incapable of carrying semantics - what the elements mean is conveyed by the
english text in the specification, of which the DTD is just one part).

There are often different ways, in a natural language, to convey roughly the
same idea. Given that different people read things slightly differently (what
automatic translators do is funny because they make more and more serious
mistakes than most people, but real humans also have differing and sometimes
"funny" interpretations of a sentence.

Very roughly speaking, The more complex the syntax, the more precise it is
possible to be about the semantics. On the other hand, the more complex and
precise the semantics, the more people's interpretation is likely to vary.
And the more complex the syntax, the more precise people are going to think
the semantics are meant to be.

==Solutions...

Some conclusions seem to follow:

Don't be more precise than you mean to be. This may help you to keep the
expression simple.

Short sentences are easier to understand. Two short sentences can be better
than one complex sentence.

Some words are more ambiguous than others. Look for words that mean what you
want to say, and that do not mean other things. If all the appropriate words
are ambiguous, think which is least ambiguous. Consider the rest of the
sentence to do this. Consider what words are similar in form.

==And some other stuff...

I'm almost certain that the word "then" in the original sentence was meant to
be "than". This change would make the sentence mean more or less the opposite
of what it says.

This is not an uncommon problem. In listening to North Americans I have great
difficulty understanding if they are saying "can" or "can't" - most of the
time I can not tell, and have to go on the context or ask them to explain.

Hence the last sentence about what makes words ambiguous.

I also ran the start of the short version through a translator and back, via
different languages. This mirrors, to a certain extent, the kinds of mistakes
different people might make in interpreting it. (Speakers of different
languages have different characteristic errors in understanding english - in
some instances people with particular disabilities have distinctive patterns
as well. They are not always the same, but the idea is to demonstrate that
there is a range of possibilities with a concrete example based on simple
machine processing of natural text. And Babelfish is indeed simple...)

====fr
This short exchange seems to show the original point: Technology cannot
interpret some linguistic models. Many people cannot interpret these models.
To avoid these models will help some of these people.

There are some methods to examine linguistic complexity.

====ja
This short exchange seems that shows the point of the origin: As for
technology it is not possible to interpret a certain language style. As for
many people it is not possible to interpret those styles. Some person of
these people it helps the fact that those styles are avoided.

There is a certain method of testing language complexity.

====zh
This short exchange demonstrates the primitive spot as if: The technology is
unable to explain some language styles. Many people are unable to explain
these styles. Will avoid these styles helping these people.

Has some method test languages to be complex.

just 2 cents worth (this has been hanging in my brain for ages)

cheers

Chaals
Received on Friday, 17 September 2004 09:35:57 UTC