Re: techniques for search results highlighting from Benjamin Hawkes-Lewis on 2010-04-25 (w3c-wai-ig@w3.org from April to June 2010)

From: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
Date: Sun, 25 Apr 2010 16:14:18 +0100
To: Andrew Johns <adjohns@gmail.com>
Cc: w3c-wai-ig@w3.org
Message-ID: <o2kde4cf6d81004250814m61599834hd99f388848871529@mail.gmail.com>
On Mon, Apr 19, 2010 at 12:55 PM, Andrew Johns <adjohns@gmail.com> wrote:
> Based on the requirement not to rely solely on colour when presenting
> information, I wondered if anyone had any thoughts on how to mark up search
> results and keywords or phrases using more than just colour.
>
> For example, in an existing web application, if I search for "oranges", after
> being presented with a list of search results, I click through to the actual
> result itself, and every instance of the word "Oranges" on that page is
> highlighted in colour.
>
> My feeling is that this would be marked up semantically using strong or em,
> perhaps with additional colour styling for visual users, and an informational
> cue above the main body of the content itself, where it informs the user that
> all instances of word are emphasized.
>
> However I'm not sure this would really be of any benefit to screen readers,
> as the intention of the colour styling is to draw focus for the user - how
> would the user "jump" to the instances in question? or is this a case where
> the screenreader or browser search feature should be relied upon instead -
> and an accessible alternative is therefore not really needed?

In a pure HTML4 context, I think marking up matches with "strong" is
reasonable. User agents could present "strong" differently to surrounding text
(for example, JAWS emphasizes "strong" the same as "b"). User agents could
allow the user to skip between "strong" elements (for example, using JAWS's
"Same Element" command).

But it's not ideal. User agents could allow the user to jump to the first
"strong" element, but I'm not aware of one that has a built-in command for that
and this would require users to appreciate fine distinctions of HTML semantics.
Most seriously, user agents would not be able to distinguish "strong" instances
relevant to the user's search from other "strong" elements that reflect
emphasis in the original document.

If you wanted to provide users the ability to jump to individual matches, you
could add an "id" to each highlight, then insert a list of links to those
document fragments at the top of the page:

   <h2>3 matches for your search for: <kbd>oranges<kbd>:</h2>
   <ol>
      <li><a href="#search-match-1">Search match 1</a></li>
      <li><a href="#search-match-2">Search match 2</a></li>
      <li><a href="#search-match-3">Search match 3</a></li>
   </ol>

   …

   <p>My father Joe loved <strong id="search-match-1">oranges</strong>.</p>

Users could cycle through matches by returning to the top of the page and
selecting another link or finding "Search match" in an alphabetical list of
links on the page. I can imagine more complex variations to speed up cycling
such as inserting a link to the next match after each match, displayed only
when the match gets content focus (and not when the document is printed, for
example).

Compare WCAG2 Technique G139 "Creating a mechanism that allows users
to jump to errors":

   http://www.w3.org/TR/WCAG20-TECHS/G139.html

I think such jumping functionality would be confusing clutter for simple
keyword search (easily replicated using user agent's native search
functionality) or if most search results involved short documents (like a blog
post or news article). But for complex (e.g. stem-based, boolean, or otherwise
fuzzy) matching within long documents, I can see explicit links offering
convenience beyond native search functionality or just reading through the
document.

Two as yet unstandardized markup alternatives are worth mentioning.

The WAI-ARIA draft includes "list" and "list-item" roles and "aria-label" and
"aria-owns" properties that I believe would allow you to designate a set of
scattered elements as members of a labelled set:

   http://www.w3.org/WAI/PF/aria/roles#list

There are WAI-ARIA implementations in recent browsers and assistive technology,
but I do not know if these are sufficiently advanced to make such markup
useful. You would have to test.

The HTML5 draft includes a proposal for a "mark" element:

   http://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-mark-element

It was originally introduced to highlight matches in search results much as you
describe. User agents could provide native functionality for skipping between
"mark" instances.

Since being introduced, its usage has broadened to include adding emphasis to
quotations from other sources. As a result, there might be some "mark"
instances relevant to the original author's intention rather than the user's
search. But I think these would amount to fewer false positives than "strong".

I'm not aware of any implementations of "mark".

I would encourage a more radical approach than these alternatives: rethink how
your search work flow operates.

Why send users to the top of a result document and make them do the work of
locating actual matches? Rather than the list of search results being a list of
documents, perhaps it should be a list of quotations of matches within
documents? You could then link directly to a relevant document fragment by
"id". For example, you could link to
"fruit-stories.html?query=oranges#search-match-3" or
"fruit-stories.html#paragraph-7".

Your dilemma reminds me of the deep search problem described by Steven Berlin
Johnson:

> Think about the difference between Google and Google Desktop: Google gives
> you URLs in return for your search request; Google Desktop gives you files
> (and email messages or web pages where appropriate.) On the web, a URL is an
> appropriate search result because it's generally the right scale: a single
> web page generally doesn't include that much information (and of course a
> blog post even less.) So the page Google serves up is often very tightly
> focused on the information you're looking for.
>
> But files are a different matter. Think of all the documents you have on your
> machine that are longer than a thousand words: business plans, articles,
> ebooks, pdfs of product manuals, research notes, etc. When you're making an
> exploratory search through that information, you're not looking for the files
> that include the keywords you've identified; you're looking for specific
> sections of text -- sometimes just a paragraph -- that relate to the general
> theme of the search query. If I do a Google Desktop search for "Richard
> Dawkins" I'll get dozens of documents back, but then I have to go through and
> find all the sections inside those documents that are relevant to Dawkins,
> which saves me almost no time.
>
> So the proper unit for this kind of exploratory, semantic search is not the
> file, but rather something else, something I don't quite have a word for: a
> chunk or cluster of text … If I have an eBook of Manual DeLanda's on my hard
> drive, and I search for "urban ecosystem" I don't want the software to tell
> me that an entire book is related to my query. I want the software to tell me
> that these five separate paragraphs from this book are relevant.

   http://www.stevenberlinjohnson.com/movabletype/archives/000230.html

Whatever you do, best have some users with disabilities try out your ultimate
implementation using realistic search tasks.

--
Benjamin Hawkes-Lewis
Received on Sunday, 25 April 2010 15:14:52 UTC