RE: FYI from Today's Washington Post

Thanks Kevin,
I couldn't agree more. The UK Government is also pushing the use of the sitemaps protocol. The motivation is two fold:
a) to improve web search, allowing more content to be indexed and expressing a preferance over which results should be returned from a site, if more than one result is a close fit.
b) to improve the quality of our web archiving activities. We are comprehensively and regularly archiving the UK central government web estate, as part of our "web continuity" project. Sitemaps helps us capture a better archive rendition of a website. "Web Continuity" is our cross government links persistance solution, by the way. 
The [UK] National Archives implemented the sitemaps protocol over a year ago, to allow our full catalogue to be indexed. More recently (in the last week!) we have just implemented the sitemaps protocol on the OPSI website - which is the UK Government's official legislation website. 
We publish several different renditions of legislation for different purposes (branded, plain, PDF) - and with legislation there are also several documents with very similar titles: the Act itself, the menu page listing the Act, the Explanatory Notes to the Act, any subordinate legislation which commences sections of the Act (which are named after the Act). We use the relative priority in the sitemaps protocol to indicate our preferance to search engines to serve the actual Act in HTML ahead of other formats and related documents. In fact we have set up a very detailed 'priority' scheme that we are surfacing through the sitemaps protocol.
One immediate result of implementing the sitemaps protocol has been with respect to our "two clicks" Key Performance Indicator. The aim of this is to measure how accessible UK legislation is to the public using the web. Each month we take a random set of 100 pieces of legislation and search for them on Google. If the relevant link, one which takes the user directly to the text of the legislation, is served on the very first page of search results, then that item is available in two clicks. If no link to the OPSI website appears in the first page of search results or the user is taken to, say a menu page on our website instead, then it doesn't count. The process of testing this is automated of course, so that it's fair and robust, and we repeat the process each month, with another 100 different pieces of legislation.
Our target for "two clicks" is 80% - and we have just about been meeting that target this year, averaging just over 80%. However, since the introduction of the sitemaps protocol this month, and specifically the use of the "priority" rating, the results have shot up to over 90% - a significant improvement.
John Sheridan 

Head of e-Services
Office of Public Sector Information
The National Archives
5th Floor
102 Petty France

Tel: 0203 334 2785 
Fax: 0208 487 1983


From: on behalf of Novak, Kevin
Sent: Thu 11/12/2008 14:02
To: eGov IG
Subject: FYI from Today's Washington Post


Firms Push for a More Searchable Federal Web

By Peter Whoriskey
Washington Post Staff Writer
Thursday, December 11, 2008; D01

Google <> 's professed corporate mission is "to organize the world's information."

But for years, the U.S. government, one of the world's largest depositories of data, has been unwilling or unable to make millions of its Web pages accessible.

"The vast majority of information is still not searchable or findable either because it's not published or it's on Web sites which the government has put up which no one can index," Google chief executive Eric Schmidt <>  said during a recent presentation at the New America Foundation <> .

Now Schmidt has a unique opportunity to change that as an informal adviser to President-elect Barack Obama <> , a tech booster who dubbed his first Senate law "Google for government" because it aimed to make federal information more accessible.

Today, a wide array of public information remains largely invisible to the search engines, and therefore to the general public, because it is held in such a way that the Web search engines of Google, Yahoo <!+Inc.?tid=informline>  and Microsoft <>  can't find it and index it. Not surprisingly, Yahoo and Microsoft officials agree that people would be better served if more public information became accessible to their search engines.

A person using one of the search engines, for example, can't find Environmental Protection Agency <>  enforcement actions against a given company, can't discover the picture of a specific ancient Egyptian artifact at the Smithsonian <>  and can't search by name for the details of a Vietnam War casualty.

And for many Web users, if an online item can't be found with a Web search engine, then for all practical purposes it doesn't exist.

"Unfortunately, too much of the public information provided on government Web sites just doesn't show up when the average American does a Google search," said J.L. Needham, Google's manager of public-sector content partnerships. "As a result, information that is intended for the public's use is effectively invisible."

To be sure, much of the information that the search engines are asking for is already digitized and available on the Web. EPA enforcement actions can be found through a portal on the agency's site, details on Egyptian artifacts can be found through a search of the National Museum of Natural History <>  and details of a Vietnam War casualty may be found by searching the National Archives site.

The trouble, as the search engines see it, is that most Web users have become accustomed to finding information by typing queries into one of the engines -- and if they don't find it there, they give up.

Needham estimates that 1,000 federal government Web sites are inaccessible to search engine "crawlers," the programs that are run to discover what information is available on the Web.

Much of the inaccessibility stems from the fact that so much federal government data, while public, can be accessed only after users fill out an online form. The search engines' crawlers generally can't look into such databases.

For example, Google notes that a user seeking details on an Environmental Protection Agency enforcement action against Anheuser-Busch <>  can't be found by entering a simple search query such as "EPA enforcement Anheuser-Busch." Instead, a person needs to know to go to a particular EPA enforcement Web site and enter "Anheuser-Busch."

To make those databases visible to search engines would require the federal government to make each item into a Web page and then to provide a list of those Web page addresses to the search engines.

Microsoft is working with more than 25 federal agencies to make their Web sites "crawlable" by search engines.

"I do agree with Google," said Molly O'Neill, chief information officer of the EPA, which has more than 200 Web sites. "When people search, they should be able to find the data."

But information technology officials in the federal bureaucracy said that the transition may require significant manpower and that the costs could be large.

"We have been working very closely with Google," said Francisco Camacho of the Web services division of the Smithsonian. "With limited resources as always, it's a little bit hard."

The National Archives expects that its entire database containing descriptions of its holdings will be available to Google by January, said Pamela Wright, a program manager for the National Archives and Records Administration. The EPA has made some sites accessible, too, and the Smithsonian has sent Google the links for 78,000 pages, Camacho said.

Some federal officials have grumbled, however, that Google is making this push purely for financial reasons: The more that is available to search engines, the more people will use search engines, letting Google show advertising to more people.

"The more information is available, the more people are likely to use Google," said Danny Sullivan <> , editor in chief of "It does help Google in the end."

But Needham said the company's motive in the federal Web site effort isn't the money; it's making sure customers find what they want.

"We don't care because there is monetization value," Needham said. "It's because if we fail to answer a question, then our users are disappointed with us, not their government."


Kevin Novak

Vice President, Integrated Web Strategy and Technology

The American Institute of Architects

1735 New York Avenue, NW

Washington, DC 20006


Voice:   202-626-7303

Cell:       202-731-0037

Fax:        202-639-7606






America's Favorite Architecture <>  Tops the Shortlist for International Honor for the Web


The American Institute of Architects is the voice of the architectural profession and the resource for its members in service to society.



This email has been scanned by the MessageLabs Email Security System.
For more information please visit 

Please don't print this e-mail unless you really need to.
National Archives Disclaimer

This email message (and attachments) may contain information that is confidential  to The National Archives.  If you are not the intended recipient you cannot use, distribute or copy the message or attachments.  In such a case, please notify the sender by return email immediately and erase all copies of the message and attachments.  Opinions, conclusions and other information in this message and attachments that do not relate to the official business of The National Archives are neither given nor endorsed by it.

Received on Thursday, 11 December 2008 19:34:48 UTC