Server Side Techniques (Techniques XML Submission Form)

 checkbox checked

Submission Results:

Technology: Server Side Techniques 
Techniques Category: navigation-mechanisms 
Submitter's Name: Yvette Hoitink 
Submitter's Email:

<technique id="UNKNOWN">
<short-name>Highlight search terms</short-name>
 <guideline idref="" />
 <success-criterion idref="UNKNOWN" />



When someone visits your site from a search engine results page, that results page’s URL is sent on to your site. This is known as the referring URL or referrer (the HTTP specification misspells this as 'referer'), and can be accessed via scripting languages such as PHP, Python, and ECMAScript / JavaScript. In that referrer there is a query string (assuming the search engine uses the HTTP 'get' method, something all the search engines we know do), which contains several keys and values. These look something like search.php?q=SEARCH+TERMS+HERE&l=en. With these keys and values, you can determine what terms were used on the search engine that listed your site as a result. 

The next step is to find all words in your page that match those that the user searched for on the search engine. Once you have a complete list of terms from the referrer’s query string, you wrap each instance of a term in a span element with a special class. Using your site’s cascading style sheets, you then highlight these terms using background colors, font weights, or different voices (depending on the target medium) so that they are more apparent to the user. We gave each search term a different class so the terms can be highlighted in different ways (e.g. every mention of 'color' is highlighted in yellow, every mention of 'coding' is highlighted blue, and so on). 

This sounds fairly easy but there are complications that need to be considered. If the visitor searches for “div,” you don’t want to replace all the <div> tags with <<span class="highlight">div</span>>. You also don’t want to add span elements inside any attribute values, or you’ll end up with something like <img src="example.png" alt="This is an example <span class="highlight">image</span>"/>. We need to strip out the tags from the plain text, parse the plain text for search terms and wrap any instances in span tags, and finally put the plain text and the tags back together again — without changing the original structure or rendering of the page.

We accomplished this using regular expressions, a powerful tool that allows you to match patterns of text (see CPAN for a basic tutorial on using regular expressions). If you want to find an HTML tag you could use PHP’s string searching functions to find every possible combination of tags, but that takes a lot of work; with regular expressions you simply search for patterns.

We use a pattern analogous to saying “look for ‘<’ followed by any amount of characters that are not ‘>’, followed by ‘>’”. The HTML file acts as the input string the regular expression tries to match the pattern against. Using this we were able to separate the HTML tags and the plain text. We then take the untagged plain text and add the span tags around search terms, then put back the HTML tags in their original positions. This way any semantic meaning and presentation — visual, aural, or otherwise — is preserved, along with the structure and validity of markup. 


 <affects group="UNKNOWN" />







<see-also><br />&lt;p>
 <loc href="" > </loc>
</p><br /></see-also>

Additional Notes:

This might be too much about usability instead of accessibility although I think it would help people orient themselves in the content. 

Received on Monday, 25 October 2004 09:15:01 UTC