Re: CleverKeys, dictionary.com and "programmatically located" from Charles McCathieNevile on 2004-03-23 (w3c-wai-gl@w3.org from January to March 2004)

From: Charles McCathieNevile <charles@w3.org>
Date: Mon, 22 Mar 2004 19:46:02 -0500 (EST)
To: Wendy A Chisholm <wendy@w3.org>
Cc: WAI GL <w3c-wai-gl@w3.org>, nabe@lab.twcu.ac.jp
Message-ID: <Pine.LNX.4.55.0403221846540.21544@homer.w3.org>
[cc- people on the wcag list. I left Watanabe-san because I don't know if he
is subscribed]

On Thu, 18 Mar 2004, Wendy A Chisholm wrote:

>There are several questions in this email. Please respond in-line to each
>question or group of questions.

Of the following four questions the first two are repeated several times. I
believe the other two apply similarly to each of the example cases offered. I
have included them once each here with an answer, in lieu of repeating them
several times below:

1. "Is it a user agent feature instead of an author responsibility?

I presume you mean "should this be a user agent requirement rather than a
requirement on the author?".

I don't think so. The answer clearly depends on the group's overall response
to the question of whether authors have a responsibility to help users with
things that can be done y a user agent.

I point out in passing that most of these tools currently rely on selecting
the word in question - something that most browsers only make possible with a
mouse, and even that often breaks down completely with text in images.

2. If not, what more does the author need to do?"

The author would still need to be sure that the criterion can be met by tools
available. I would suggest they need to identify the tools capable of
providing the right answers - if necessary including more markup to allow for
determining the correct meaning where there is more than one possibility.

There are many possible techniques for doing this, including using systems
like Annotea, SWAP or GRDDL to provide re-rendering capability, building the
tools as middleware rather than user agents (the ones I know of at the moment
- Babylon is one I am more familiar with - and in passing it allows
customised dictionaries, with multiple languages available), and so on.

3. If it satisfies the criterion, do we need the criterion?

Yes. Otherwise there is nothing that makes it clear what is required. More
importantly, if the technique fails the criterion it is a failure of the
author, who didn't ensure that there was a linkage to something that works,
and of the user agent, which didn't do anything useful. The author should
ensure that things are linked in such a way that there is success - i.e. that
the dictionary/ies they link actually include(s) the necessary information.

Unless we have the criterion somewhere, there is no real grounds for working
out what the sensible appraoch to resolving the problem is. (Well, no grounds
within WCAG).

4. If we keep the criterion, is user agent support a sufficient technique?

As far as I can tell there is not sufficient grounds for pushing
responsibility for this onto the user at this stage. But I think that is a
discussion the working group should have after they understand how the
technology works, and fter they have some clearer understanding of how they
create conformance schemes, based on an actual set of guidelines and criteria
that can be included in such a scheme.

>There are several tools that people can use to select a word in Web content
>and look up the meaning in a dictionary.  Examples include:
>
>1. "CleverKeys is free software that provides instant access to definitions
>
>2. In Opera, you can, "Translate words in other languages, or look them up
>in a dictionary. Simply double-click on a word, or right-click on a
>selection."  This is #10 in the list at:
><http://my.opera.com/community/tips/oneliners/>
>
>Do these methods, and similar ones like them, satisfy the four
>"programmatically located" aspects of 3.1?
><http://www.w3.org/TR/2004/WD-WCAG20-20040311/#meaning>

These methods could be used to satisfy those requirements. They clearly don't
automatically do so. Success depends on the particular dictionaries at the
other end of the service. There is another piece of software called Babylon
that I am more familiar with - it allows configuration of dictionaries used,
which is an important step, since you can verify whether or not the terms you
use are included. Similarly, it is possible to develop tools based on
Annotea, which would actually proivide the ability to link the page to a
version where words become "hot". This is sort of what SWAP plans to do,
looking at the schema that they have (RDF techniques stuff).

>Here are some tests (with CleverKeys for Windows) and questions:
>
>1st Success Criterion to test: Level 1, #2  - "The meaning of abbreviations
>and acronyms can be programmatically located."
>
>Test 1: go to the W3C [1], highlight the second instance of "W3C" in the
>first paragraph, press control+L to activate CleverKeys.
>[1] <http://www.w3.org/>
>
>Result 1: It lists the correct expansion of W3C
><http://dictionary.reference.com/search?q=%20W3C%20>
>
>Question: Does this satisfy the success criterion?

Apparently it satisfies the criteria - it identifies the thing in question.


>Test 2: go to the W3C [1], right-click on the second instance of "W3C" in
>the first paragraph,  select "dictionary" from the pop-up menu.
>
>Result 2: It does not list the correct expansion of W3C
><http://www.infoplease.lycos.com/search.php3?in=dictionary&query=W3C>
>
>Question: Does this fail the success criterion?

Yes it fails.

>2nd Success Criterion to test:  Level 2, #2 - The meanings and
>pronunciations of all words in the content can be programmatically located
>
>Test 3: go to the W3C [1], highlight "specifications" in the first
>paragraph, press control+L to activate CleverKeys.
>
>Result 3: It finds a definition for "specifications" -
><http://dictionary.reference.com/search?q=specifications>
>
>Question: Does this satisfy the success criterion?

Apparently not. One of the success criteria requires that the user get to the
actual meaning, not a handful of choices.

There is a deeper question of whether the definition given is adequate.
Lexicographers usually use the rule of thumb that the definition must not
contain the term being defined (to avoid circular logic) and must be simpler
in its vocabulary. Essentially you should be able to determine that
successive application gets to a simple vocabulary and grammar structure.
This tends to be done as a rough art, rather than an actual reductionist
process. If someone were to offer an online electronic open Oxford English
Dictionary it would be an interesting experiment to test it in this way.

>3rd Success Criterion to test: Level 2, #3 - The meaning of all idioms in
>the content can be programmatically determined.
>
>Test #4: go to ESL idiom page [2], highlight "beat around the bush," press
>control+L to activate CleverKeys.
>[2] <http://www.eslcafe.com/idioms/id-list.html>
>
>Result #4:  It displays the result for "beat."  In the results is a list of
>idioms that contain the word "beat" - including "beat around/about the bush
>- To fail to confront a subject directly."
><http://dictionary.reference.com/search?q=beat%20around%20the%20bush>
>
>Question: Does this satisfy the success criterion?  If so, is it a user
>agent feature instead of an author responsibility?  If not, what more does
>the author need to do - link directly to the definition of "beat around the
>bush" instead of the general "beat?"

Right. Identify the actual meaning in question. I think this is an example
that shows we shouldn't necessarily consider the criteria we have as perfect.
Looking at user experience and use cases to make sure is a fairly important
exercise.

>4th Success Criterion to test: Level 3, #1 - The meaning of contracted
>words can be programmatically determined.
>
>Test #5: go to cybernothing [3], highlight the word "can't," press
>control+L to activate CleverKeys.
>[3] <http://www.cybernothing.org/>
>
>Result #5: The results begin with "cant" and further in the list is "can't
>- Contraction of cannot."  <http://dictionary.reference.com/search?q=can%27t>
>
>Question: Does this satisfy the success criterion?

Apparently not, since it can't distinguish beteween cant and cannot, it is
leaving it to the user to do so. Some people may regard considering this as a
problem as mere cant, and there is certainly a legitimate discussion to be
had on whether the success criteria in the draft are in fact good enough.

>Language questions:
>1. Are there similar tools and dictionaries that are freely available in
>other languages?

There are tools available in a number of languages, but not necessarily free,
and the quality of the tools (most particularly the content of the
diciotnaries and thesauri they rely on) is a critically important
consideration.

>2. Assuming there are similar tools for Dutch, how would the results differ
>for Dutch words that are aggregates of words? As with idioms, will tools
>look for the meaning of each separate word?

>3.  What about Japanese?  Hebrew? Spanish? Arabic?  German? French?  Are
>there similar tools for these languages?  What issues would tools have in
>other languages?

Yes there are tools for many of these languages, probably all, and many other
languages besides. I have used them for Portuguese, Russian, Japanese,
Chinese, Hebrew, Maori, Spanish, Italian, French, German, and I have written
them for the Yolngu Matha languages of North-East Arnhem Land, in Northern
Australia. The issues are the quality of the information available (Welsh is
a different case to Vietnamese, which is spoken by about ten thousand times
as many people), access to fonts, availability of encodings (especially for
things like symbol languages, sign languages, and little languages like the
Yolngu Matha group - 31 related languages, fewer than 10,000 speakers many of
whom speak 4 or 5 yolngu languages and nothing else), etc. Mostly i18n
issues.

>4. If automatic lookup of words works for some languages and not others,
>how do we create guidelines that will apply across languages?

My suggestion hasn't changed over the last few years. Languages should be
treated like different technologies, so that language-specific questions can
be drawn out according to language, with appropriate success criteria.

>5. If the tools are possible, but not available today, do we write "lowest
>common denominator" guidelines that apply across all languages, or do we
>have different guidelines depending on tools available today?

I believe the best approach is to write general guidelines. There are
techniques that put all the responsibility on the author (such as
annotea-type solutions, or the related GRDDL/SWAP-type approach) to check
that they have resolved the problem for their particular case. In some cases
it seems we can offer the author some easier steps to solve the problem,
using existing work. If there was a functional tools group one could develop
tools that would automatically test a lot of this stuff (I know of such tools
that actually exist for specific use cases).

There are a massive number of languages in use today. Groups like the Summer
Insititue of Linguistics (If I recall, their goal is to at least get the
bible translated into every language - that implies quite a large resource
becoming available, which is heavily studied and cross referenced so it can
serve as a basis for automatic generation of a large but specific dictionary
and some basic grammar) track those which are being used. I think the best we
can do (especially in a group that is mostly monolingual) is determine the
requirements in abstract terms and provide models for people to produce
technology-specific (in this case language-specific) requirements that are
more detailed. This has implications for any conformance scheme we produce...

>6. Is user agent support a sufficient technique?

Is there any evidence to suggest that it works effectively in the general
case? Do we require that people use a particular tool? These are much more
general questions that the working group needs to at least shape an answer to
before it is possible to give one for this specific case. I think in the
current circumstance tha answer is probably "no", but there are a range of
"middleware solutions" available so the author doesn't actually ahve to go
through the entire document marking words, in most cases.

cheers

Chaals
Received on Monday, 22 March 2004 19:46:02 UTC