- From: Al Gilman <Alfred.S.Gilman@IEEE.org>
- Date: Wed, 8 Sep 2004 10:58:03 -0400
- To: "Gregg Vanderheiden" <gv@trace.wisc.edu>, wai-xtech@w3.org
** summary: Operational Requirements: Terms -- two distinct kinds: - technical usage (there is one and only one correct definition, in the author's opinion) - natural usage (the meaning "is, y'know..." and dictionaries document the range of meanings in circulation, ... but approximately and not uniquely) We need to handle these two cases separately in identifying requirements and solution techniques. Locatable: The requirement is that the source provide a remotely operable query interface which will accept the spelling of the term as the only information extracted from the use in the document and this query will return zero or more explanations as found for that spelling in that source. Standardizing these query interfaces, which will happen first at second order by having the queries govened by schemas and the schemas cross-related by maps in metadata, is beyond what we need to consider to establish that there is a practical way to proceed with these content requirements. Derived suggested content requirements: [Note: SHOULD/MUST, or "is this a success criterion and at what level" is left to a policy debate in WCAG WG...] A. terms in technical usage: [The author SHOULD:] connect each such term with a unique explanation, known also as its definition. The connection must be a definite authority chain, but it may be indirect through a variety of connection mechanisms: - associating a specific instance of a term in the current text with an explanation somewhere. [The explanation may be in a note, local glossary, or URI-identified utterance somewhere. But for external references if the work provides more than one explanation the reference shall be to one in particular, using a #fragment or query-part syntax appropriate to the resource.] - enumerating this term in a glossary for the current document, paired with a definition. - the term appearing so enumerated and defined in a document that the current document cites as normative, or reached by a chain of normative references of arbitrary length. - terms in natural usage: [The author SHOULD:] connect a) scopes ranging from individual term instances up through the whole document, with b) sources of explanations ranging from individual explanation instances up through published [Dublin Core entity] dictionaries or Web Services .. UNTIL the set of terms appearing in this document scope for which there are zero possible explanations recoverable from the sources associated above is empty. In other words, there are no terms that have no explanations. The possible explanations counted for a given instance of a given term include all explanations recoverable from any source associated with any [ancestor] scope in the context of the current term-instance. * homework items remaining - techniques (feasible first, then maybe recommended) for: - Glossary formats in HTML - Glossary formats in XML - Glossary notation in SKOS - Linking to definitions in HTML - Linking to definitions in XML - Linking to definitions in SKOS, extended - Linking to explanation sources from HTML - Linking to explanation sources from XML (do we need to be more precise? SVG?) - Linking to explanation sources via RDF (relationships, reductions) - Thesaurus information in SKOS - Thesaurus information embedded in XSD ** details inline below At 10:15 PM -0400 8/6/04, Gregg Vanderheiden wrote: >Would you send this to the right people > >This is for part of our discussion next week. [joint session >WCAG:PF:SW et_al -- Al] > >Thanks > >Gregg -- copied below, with inlines comments. >Programmatically Located, Meanings and Pronunciations [ Initial draft >-- thinking out loud] > >Problem > >Currently there is a problem when reading content whenever a word, >abbreviation, acronym or phrase is encountered that the reader does >not know the meaning of. If there was a way for the individual to >locate the meaning of these semantics units when they occur the >overall understandability of the content would be greatly enhanced. >It would also provide a mechanism for people of all reading levels to >better handle content they don't understand. Although words, phrases, >jargon, foreign phrases and abbreviations appear to be very >different, there is a possibility that they could all be addressed >using the same strategy. Actually, we need to recognize that the requirements, and hence the appropriate techniques, are different for two different categories of terms: - technical terms and symbols - Terms of Art, acronyms, etc.: Here there is a unique explanation that the author intends, and the system should lead you uniquely there. Third-party writings are acceptable here only to the extent that the author explicitly mentions them as normative references. - hard words - this includes infrequently-used terms, foreign terms, and figurative usage which is a stretch or may not communicate with second-language speakers of the current language: Here there is no exact definition, but the senses in which the word is current in the language are explained and these explanations collected in dictionaries. In this case the most common error mode is that the user is able to recall zero meanings, not the wrong meaning. Here third-party assistance is the standard operating procedure. Dictionaries are developed by lexicographers, not by authors. They summarize the range of meanings in use, neither only nor exactly what any one speaker means. >If we think of all ambiguous words, phrases, abbreviations, etc. as >basically consisting of a string of characters whose meaning is >unknown to the user, providing a common mechanism to look up the >meaning of a string of characters would allow the reader The other problem is where the user can recall with *no* meaning for the term. Not just can't fathom *which* meaning that they recall is the one to use. That's the more common event that sends us looking for a dictionary. >If the EXACT meaning of the string of characters could be easily >found (programmatically DETERMINED) , then it would be possible to >provide a mechanism to automatically determine the meaning of the >string. Here is the key. For technical terminology, the author intends for the term to be interpreted in terms of one specific explanation. We could argue as to how EXACT that explanation is, but there is at least one explanation that uniquely meets the author's expectations as to what the term is meant to mean. When people use terms in conversation other than these technical terms in technical conversations, they mean what they expect their hearer or reader to understand, but even the 'definitions' given in the dictionary are not exact, nor does the speaker expect the hearer to apply any specific explanation in understanding the term. For technical usage we have existing glossary practice that we need to fold into our bag of tricks. For terms not uttered in a technical context, there is no single appropriate explanation of what they meant. But you can go to a dictionary and finger the one entry that best fits this use of the term. Indicate that this entry should be preferred to the others when interpreting this term in the cited context. That can be one appearance of the term or a whole website. For help to people with more trouble reading than the general speaker population, we will look to apply techniques that are intermediate between the existing technical-glossary and dictionary-lookup patterns of practice, borrowing as much by way of parts and assembly patterns for our techniqes as we can from the existing practice. >Doing so however would require that the specific meanings of >many words would have to be individually marked. No. That assumes that all words bear the same requirements. It is reasonably easy to estimate the likelihood of a given term being mis-understood or not understood at all. The terms most at risk should be given priority in documenting their appropriate interpretation and not assume that there is an all-or-none standard for word help. Technical terms should all be documented. That is to say, there should be a machine-followable path from uses of technical terms in a technical writing to *the unique explanation* that the author expects to govern the use of this term in their writing. This doesn't mean that each use of such a term in the text has to be marked, but there has to be a clear indication that in some document scope, that term is used in accordance with the explanation given in [the glossary, some normative reference, or so forth.] >A simpler task would >be to provide the user with a list of possible meanings for the >string. Yes, this is the method that describes current-day dictionary-lookup screen services such as Atomica and [what was the free one that was discussed on WCAG list?]. In this situation, the user most likely comes up with *no* interpretation of the term that they can recover even when they see the term spelled out before them. In this case the user reviews some candidate explanations that explain senses that are in circulation among the speakers of this language. Usually only one of these (or two near-synonyms among them) fit in with the context of the term as it is used in this place in the text. The user is able to resolve, to dis-ambiguate, the multiple candidate explanations with the benefit of their understanding of the situation set up by the surrounding text and the user binds to this sense and moves on. >Although, this is not as useful as being able to point to the >exact, correct meaning, it still would be of great benefit. In >addition, in many cases, the string would be unique and so the >mechanism to 'programmatically locate' the meaning would by default, >be a mechanism to determine the exact meaning. For example, a common >foreign phrase, such as "c'est la vie". The meanings of natural terms, even idioms, are fuzzy when you look at them closely enough. I would not suggest we put much faith in the above line of reasoning. >Proposal > >It is proposed that a mechanism based on reverse-cascading >dictionaries be defined that would allow user agents to quickly >locate the definitions of words or phrases (character strings) >specified by the user. The ordered list is overkill. We should definitely look at mechanisms that revolve around scoping, where an explanation bound to an inner scope takes precedence over an explanation bound to an outer scope. But the idea that the author specifies a precedence order among dictionaries, and that the user agent stops on first match, is over-design and bad design. For natural terms, it helps to have the sundry interpretations, available from different sources, all available; to help the user recognize which of these matches the current individual use best. Here we want the union of the dictionaries mentioned in the ancestor contexts of this text block, not only the innermost or the first-listed where some explanation is found. For technical terms, one wants to borrow from multiple sources and there is no need for a precedence order. In technical writing if there is a conflict between the senses of one spelling between two of your normative references, you just use a different term that is not the victim of such a conflict. If one is writing for the general reader and wants to use a technical term in its technical sense, this calls for a glossary entry to introduce the technical sense to the general reader, which can then link directly to the intended explanation. >In practice, an author would specify a list of cascading dictionaries >that should be used with particular page, or section of a website, or >website. This could be done by either embedding an ordered list of >dictionaries in meta-data associated with the page itself, or by >putting an ordered list in a specific document (e.g., >dictionaries.html) at the root of the URI for the page (or at any >apparent level of the URI). For example the dictionary for the web >page www.mycomp.com/docs/sales/plan/easternplan.html could be located >at www.mycomp.com/dictionaries.html. > >Once the user agent has fetched the ordered list of dictionaries, it >would search for the word, starting at the highest level of the >dictionary and then working through the cascade. For speed, >simultaneous queries could be made to all, or if they were on a >single site, a compound request might be issued. The definition >presented to the user would be either the one in the most "local" >dictionary or the results could be presented to the user with the >most "local" dictionary definitions presented first. > >By using cascading dictionaries, the meanings of abbreviations or >acronyms can be defined differently for different pages or different >areas of a website. Substitute 'scoped' for 'cascading' and this virtue holds just the same. Where this breaks down, you get specific about which terms [which spellings] follow which source. >The user experience > >To the user, this might look like an individual right-clicking on a >word, (or highlighting a phrase and right-clicking on it) and >selecting "dictionary" or "meaning" or "look-up" from the drop-down >menu. The user agent would then provide a results display of some >type, with the meaning(s) of the word or phrase displayed. For >different types of user agents, different mechanisms could be used to >provide an equivalent functionality appropriate to the medium being >used (visual, auditory, etc.) > >(The right-click menu could also provide opportunities for the >individual to look up the meaning or symbolic or other forms, if they >are available or if a converter was available to turn the text into >symbols.) Yes. This keys off two things. The user's HTTP-equivalent 'accept-language' preferences, and optional or attached cross-language and thesaurus resources. There is another user of the word-explanation resources and word-disambiguation metadata. This is the language translation process, whether automatic or semi-automated. >Author experience > >For a page to qualify as meeting the success criteria dealing with >"programmatically locatable", it would be necessary to have all of >the words or phrases in the document be programmatically locatable. >An author would start by simply associating some major online >dictionary with the page using their author tool. Immediately, most >of the words in the document, would be covered, and only those words >that were not in the dictionary would be highlighted and provided in >a list to the side. No. The author would 'start by' spell-checking their document. This may be a command or it may be done automagically and continuously as the user types. That is to say, this what the most authors would do. Authors knowing that they meant to use terms in technical senses are a separate case. But start with spell-checking. Words not in the spell-checker, the author should find a good external source of explanation for, or add a glossary entry. Words in the spell-checker used in the office desktop are the responsibility of the webmaster and network manager to make sure that they are covered. If there are terms that the spell-checker accepts that one can't look up, either the network manager kills them out of the standard dictionary or the webmaster comes up with the dictionary where they are explained and adds a script to the check-in process that adds this link to pages that use that term on check-in. External sources would be linked somehow to the context, and then the user would be allowed to add these terms to a personal spell-check dictionary so they would no longer raise red flags on spell-check. This check is repeated at check-in to the web content management system, where the content is scanned for coverage by dictionaries. Terms that are either technical or genuinely obscure should be dealt with on a term-by-term basis. This can be guidelined in terms of word frequency. Words with a frequency below some threshold would be (by house rules today, by universal guidelines tomorrrow) expected to be bound to specific interpretations or at least to explicit references to sources. >The author would then associate any other >dictionaries that were common to the type of material that they were >generating (e.g., they might associate a medical dictionary if they >were using a lot of medical terms, or they might associate a W3C >glossary if they were using a lot of web terms). Instantly, all of >those terms would disappear from the "un-locatable" listing. Authors >would usually have two or three dictionaries or glossaries that they >commonly used with their content. In addition, a company may maintain >a dictionary of its own that it uses frequently or terms that it has >defined itself. In order for a page to qualify, it is not necessary >for proper names to be locatable,. However, companies may want to >provide a short description to their product names as well as their >jargon. By attaching this list to the "local" end of the cascade, >they can make it easy for individuals to find those definitions, >understand those terms, identify those products or get the proper >expansion for an acronym that has many different definitions in >different fields. > >If the website has a set of cascading dictionaries at its root, then >the author may find that all of the words on the page are already >"locatable" without them doing anything at all. If there is a >peculiar word on the single page, it is also possible for the author >to put a set of dictionary words directly inside the page. > >Implementation Techniques (Programmatically located) Handling >dictionaries with different query formats > >It would be good if there was a standard mechanism for the dictionary >so that a simple command with the source string could be sent to any >of the URIs and it would return the block of text for that term. This goes far beyond the realm of the content guidelines: - identify specific definitions for technical terms, acronyms, and the like - identify sources for explanations for your remaining terms SKOS would appear to have all the tools we need to meet the requirements for sources and linking to sources for the second case, for natural usage. On the other hand, it does not appear to have everything we need for the first case, for technical terms and the like. There is no notion of a *unique proper* explanation in SKOS as it stands, the uniqufying quantifier on a relationship would have to be drawn in from elsewhere in the RDF cluster of commonly-used RDF names for long-standing concepts like unique. For the first case we still need to introduce some terms from the vernacular of technical documents and their terminology. I expect that we can steal this from the work on terminology in the accredited standards world, but I have not yet dug it out. >However, URIs having different search query forms could still be >used. The URI in the document would simply take the form of the full >URI, including the query text with a placeholder where the word or >phrase to be searched would be inserted. Thus, a set of cascading >URIs could access a set of cascading dictionaries, each of which had >a different query form. They would do it by simply including the >query form in the list of URIs. This is 'protocols' business. The approach to take is via the schema behind the query language with which one queries the resource. [compare with the work on the 'info' URI scheme, where the schemas are public and all publishers of URIs that cite that namespace for their query fields agree to abide by the published schema. There are approaches both with XQuery for XML resources and RDF Query for resources connected references uttered under RDF conventions. The point here is that the query exist per resource, not that it has to be standardized in order for us to have a technique that supports the feasibility of a guideline. [The general architectural principle violated in what Gregg brainstormed above is that for robustness and evolvability, we don't tie our format to format in external modules. We tie our format to function in external modules so that multiple external modules can interoperate with our format so long as their format provides the needed function in some way.] >http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=<string> >http://www.encyclopedia.com/searchpool.asp?target=%22<string>%22 >Alternate ways of associating the list of dictionaries with the >content > >Content may not always be of a form that it is easy to attach other >information. A number of methods might therefore, be used. Some were >described above. Another might be to have a page with the cascading >dictionary list at the same URI, except the ending would be some >standard ending, say ".dict". Don't compete with the Web. There are some technical disagreements being hammered out, we hope, in a 'linking' task force in W3C. But there are too many ways that exist and are not tainted by these controversies. In particular, typing resources by file extensions is "not cool" per the Architecture Document. For dictionaries the first thing to try is schemas and thesaurus relations across the metadata of different schemas. [This is where Eric has been living for years, professionally.] A given dictionary document or Web Service would explain what it offers with a schema. In this schema or in a separate writing referring to this schema, the relationship between their concepts and terms and those of lexicographical standards should be set out in metadata serving the function of a data thesaurus. SKOS is an example of a notation for this thesaurus information, and Topic Maps is another framework capable of integrating this information. >In the end, we probably do not want to have a million different >formats and options. However, we should have a selection of options >that are very robust and allow the author to be able to handle the >full range of web content types. This is where we don't want to be in competition with the W3C. The broad architectural principle is that "anywhere you can use a URI, you should be able to use any URI." So the varieties of URIs that can cite a document or a section in a document is not for us to specify. Other than we could join with web users in general in measuring what forms are well supported and sharing that information with authors. Al
Received on Wednesday, 8 September 2004 14:58:39 UTC