FYI from Today's Washington Post


Firms Push for a More Searchable Federal Web

By Peter Whoriskey
Washington Post Staff Writer
Thursday, December 11, 2008; D01

ine> 's professed corporate mission is "to organize the world's

But for years, the U.S. government, one of the world's largest
depositories of data, has been unwilling or unable to make millions of
its Web pages accessible.

"The vast majority of information is still not searchable or findable
either because it's not published or it's on Web sites which the
government has put up which no one can index," Google chief executive 
Eric Schmidt
line>  said during a recent presentation at the New America Foundation
tid=informline> .

Now Schmidt has a unique opportunity to change that as an informal
adviser to President-elect Barack Obama
line> , a tech booster who dubbed his first Senate law "Google for
government" because it aimed to make federal information more

Today, a wide array of public information remains largely invisible to
the search engines, and therefore to the general public, because it is
held in such a way that the Web search engines of Google, Yahoo
ine>  and Microsoft
usiness&mwpage=qcn&symb=MSFT&nav=el>  can't find it and index it. Not
surprisingly, Yahoo and Microsoft officials agree that people would be
better served if more public information became accessible to their
search engines.

A person using one of the search engines, for example, can't find 
Environmental Protection Agency
ection+Agency?tid=informline>  enforcement actions against a given
company, can't discover the picture of a specific ancient Egyptian
artifact at the Smithsonian
?tid=informline>  and can't search by name for the details of a Vietnam
War casualty.

And for many Web users, if an online item can't be found with a Web
search engine, then for all practical purposes it doesn't exist.

"Unfortunately, too much of the public information provided on
government Web sites just doesn't show up when the average American does
a Google search," said J.L. Needham, Google's manager of public-sector
content partnerships. "As a result, information that is intended for the
public's use is effectively invisible."

To be sure, much of the information that the search engines are asking
for is already digitized and available on the Web. EPA enforcement
actions can be found through a portal on the agency's site, details on
Egyptian artifacts can be found through a search of the National Museum
of Natural History
ral+History?tid=informline>  and details of a Vietnam War casualty may
be found by searching the National Archives site.

The trouble, as the search engines see it, is that most Web users have
become accustomed to finding information by typing queries into one of
the engines -- and if they don't find it there, they give up.

Needham estimates that 1,000 federal government Web sites are
inaccessible to search engine "crawlers," the programs that are run to
discover what information is available on the Web.

Much of the inaccessibility stems from the fact that so much federal
government data, while public, can be accessed only after users fill out
an online form. The search engines' crawlers generally can't look into
such databases.

For example, Google notes that a user seeking details on an
Environmental Protection Agency enforcement action against 
s+Inc.?tid=informline>  can't be found by entering a simple search query
such as "EPA enforcement Anheuser-Busch." Instead, a person needs to
know to go to a particular EPA enforcement Web site and enter

To make those databases visible to search engines would require the
federal government to make each item into a Web page and then to provide
a list of those Web page addresses to the search engines.

Microsoft is working with more than 25 federal agencies to make their
Web sites "crawlable" by search engines.

"I do agree with Google," said Molly O'Neill, chief information officer
of the EPA, which has more than 200 Web sites. "When people search, they
should be able to find the data."

But information technology officials in the federal bureaucracy said
that the transition may require significant manpower and that the costs
could be large.

"We have been working very closely with Google," said Francisco Camacho
of the Web services division of the Smithsonian. "With limited resources
as always, it's a little bit hard."

The National Archives expects that its entire database containing
descriptions of its holdings will be available to Google by January,
said Pamela Wright, a program manager for the National Archives and
Records Administration. The EPA has made some sites accessible, too, and
the Smithsonian has sent Google the links for 78,000 pages, Camacho

Some federal officials have grumbled, however, that Google is making
this push purely for financial reasons: The more that is available to
search engines, the more people will use search engines, letting Google
show advertising to more people.

"The more information is available, the more people are likely to use
Google," said Danny Sullivan
rmline> , editor in chief of "It does help
Google in the end."

But Needham said the company's motive in the federal Web site effort
isn't the money; it's making sure customers find what they want.

"We don't care because there is monetization value," Needham said. "It's
because if we fail to answer a question, then our users are disappointed
with us, not their government."


Kevin Novak

Vice President, Integrated Web Strategy and Technology

The American Institute of Architects

1735 New York Avenue, NW

Washington, DC 20006


Voice:   202-626-7303

Cell:       202-731-0037

Fax:        202-639-7606






America's Favorite Architecture
<>  Tops the Shortlist for International Honor for the Web


The American Institute of Architects is the voice of the architectural
profession and the resource for its members in service to society.



Received on Thursday, 11 December 2008 14:03:45 UTC